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Maintaining  the  closure  property  is  an  important 
database  language  design  objective.  Queries  issued  in  a 
query  language  that  possesses  this  property  return 
results  that  are  structured  and  modeled  by  the  same  data 
model  for  which  the  query  language  is  designed.  A useful 
consequence  of  this  is  that  the  result  of  a query  can  be 
uniformly  operated  on  by  other  queries  (i.e.,  using  the 
same  language  constructs).  None  of  the  existing  query 
languages  that  have  been  designed  for  the  class  of 
object-oriented  data  models  possesses  the  closure 
property.  In  this  dissertation,  we  introduce  a new  query 
model  for  object-oriented  databases  in  which  this 
property  is  preserved. 
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Furthermore,  we  make  our  query  model  concrete  by 
introducing  an  object-oriented  query  language  named  OQL 
as  an  example  of  this  query  model.  A query  in  this 
language  returns  a subdatabase  whose  structure  consists 
of  some  selected  object  classes  and  their  associations. 
The  objects  that  fall  in  the  patterns  of  object 
associations  specified  in  the  query  constitute  the 
extension  of  the  resulting  subdatabase. 

In  this  dissertation,  we  also  introduce  a knowledge 
definition  language  for  defining  deductive  rules  and 
integrity  constraints  pertaining  to  an  object-oriented 
database.  Deductive  rules  in  this  language  derive  new 
patterns  of  associations  among  objects  based  on  existing 
and/or  other  derived  patterns.  Deductive  reasoning  about 
a large  number  of  objects  stored  in  a database  is  a 
needed  functionality  in  several  new  database  application 
domains  (e.g.,  CAD/CAM  databases). 

The  OQL  and  the  knowledge  definition  language  are 
tightly  coupled.  This  facilitates  the  integration  of 
concepts  and  techniques,  which  are  typically  found  in 
different  categories  of  systems  such  as  database 
management  systems  and  expert  systems,  into  one 
integrated,  object-oriented  knowledge  base  management 
system  (OKBMS)  that  meets  the  specifications  and 
requirements  of  the  new  database  application  domains. 
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CHAPTER  1 
INTRODUCTION 

The  limitations  of  the  existing  record-oriented  data 
models  have  long  been  observed  [KEN79 , HAM81 , DAT82]  . To 
alleviate  these  limitations,  several  object-oriented  (00) 
and  semantic  data  models  have  been  introduced  as  the 
potential  alternatives  for  modeling  many  advanced 
database  applications  such  as  CAD/CAM,  office  automation, 
and  multimedia  databases  [HAM81,  KIN84,  BAT 8 5 , SU86 , 
WOE86,  FIS87,  HUL87]  . Semantic  and  00  data  models  can 
capture  much  more  of  the  semantics  of  these  application 
domains  in  a "natural"  way.  They  provide  a rich  variety 
of  modeling  constructs,  which  simplifies  the  task  of 
modeling  complex  application  domains. 

The  term  "00  data  model"  is  used  to  refer  to  a data 
model  that  is  a structurally  and/or  behaviorally  object- 
oriented  [DIT86]  . A structurally  00  data  model  is  one 
that  encompasses  at  least  the  following  characteristics: 

(1)  It  allows  for  defining  aggregation  hierarchies. 

(2)  It  allows  for  defining  generalization  hierarchies. 

(3)  It  supports  the  unique  identification  of  objects, 
that  is,  each  object  is  assumed  to  have  a unique 
object  identifier  (surrogate). 
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The  00  view  of  an  application  world  is  structurally 
represented  in  the  form  of  a network  of  classes  and 
associations  (links),  which  can  be  aggregation  or 
generalization  associations.  Object  classes  can  be  either 
"primitive”  classes  whose  instances  are  of  simple  data 
types  (e.g.,  integer,  string,  real)  or  "non-primitive" 
classes  whose  instances  represent  real  world  objects 
(e.g.,  part,  ship,  employee).  At  the  extensional  level, 
instances  of  different  classes  can  be  related 
(associated)  with  each  other,  forming  patterns  of  object 
associations.  A behaviorally  object-oriented  data  model, 
on  the  other  hand,  is  one  in  which  operations  that 
describe  the  behavior  of  the  objects  of  a class  can  be 
defined  and  registered  with  that  class. 

The  basic  features  that  distinguish  an  00  view  of  a 
database  from  a relational  one  can  be  summarized  as 
follows.  First,  objects  in  an  00  database  are  represented 
independently  of  their  descriptive  properties  and  of 
their  associations  with  other  objects.  Every  object 
(physical  or  conceptual)  is  represented  in  the  database 
by  a globally  unique  object  identifier  (0ID).  Second, 
associations  among  objects  are  represented  directly 
(explicitly)  using  their  OIDs  instead  of  indirectly 
(implicitly),  as  in  the  relational  data  model,  using 
matching  key  and  foreign  key  attribute  values.  Third, 
information  about  an  object  is  logically  localized, 
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whereas  in  the  relational  model  information  about  a 
single  object  may  become  logically  scattered  among 
several  relations  due  to  normalization.  Fourth,  object 
classes  that  share  common  structural  and  operational 
characteristics  are  grouped  into  generalization 
hierarchies  where  properties,  operations  and 
relationships  are  inherited  downward.  Fifth,  complex 
objects,  which  model  the  physical  aggregation  of  simple 
objects,  can  be  represented.  A complex  object  is  formally 
defined  as  a hierarchy  of  exclusive  component  objects 
[BAN87 , KIM8  7 ] . Finally,  just  as  an  object  can  be  an 
instance  of  some  class(es),  a class  can  also  be  an 
instance  of  some  higher  order  class(es).  Thus,  data  and 
metadata  can  be  handled  uniformly.  In  a previous  work,  we 
presented  a model  for  metadata  that  facilitates  such 
uniform  treatment  [ALA85,  SU88] . 

These  distinguishing  features  of  an  00  data  model 
can,  if  properly  utilized,  be  beneficial  to  three 
categories  of  users.  The  first  category  of  users  is  the 
database  designers  who  can  have  a more  powerful  tool  for 
designing  the  conceptual  schema  of  an  application  world. 
The  second  category  is  the  database  user  who  can  be 
provided  with  an  00  view  of  the  application  world,  which 
is  more  natural  and  comprehensible  (especially  if  it  is 
graphically  represented)  than  a relational  view.  In 
addition,  the  user,  if  provided  with  a language  based  on 
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the  00  data  model,  will  be  relieved  of  having  to  restate 
most  of  the  semantics  of  the  application  world  in  his/her 
queries  since  these  semantics  would  have  already  been 
captured  in  the  schema  of  the  database  and  stored  in  its 
data  dictionary.  This  in  turn  provides  a higher  degree  of 
logical  data  independence  and  increases  the  productivity 
of  the  user.  The  third  category  of  users  is  the 
underlying  DBMS,  which  can  be  designed  to  provide  more 
efficient  storage  structures,  processing,  and  query 
evaluation  procedures  that  better  serve  the  requirements 
in  this  environment. 

In  the  literature,  three  different  approaches  to 
employing  an  00  data  model  can  be  identified  based  on  the 
intended  categories  of  users  to  be  served.  These 
approaches  are  summarized  below. 

(1)  Semantic  and  00  data  models  have  been  used  in  the 
area  of  conceptual  database  design.  This  has  been  the 
traditional  use  of  semantic  data  models  because  they 
provide  richer  and  more  natural  modeling  constructs. 
After  a schema  based  on  a semantic  data  model  is 
designed,  it  is  normally  mapped  into  a representation 
using  one  of  the  record-oriented  data  models  (e.g., 
the  relational  data  model)  in  order  to  implement  it 
on  an  existing  DBMS.  The  original  schema  is  not 
further  made  use  of  by  the  system.  This  approach 


5 


benefits  the  first  category  of  users  cited  above, 
i.e.,  the  database  designer. 

(2)  An  00  data  model  can  be  used  to  provide  a front-end 
component  to  an  existing  DBMS  (e.g.,  relational). 
Queries  posed  against  the  00  view  are  translated  by 
the  front-end  component  to  equivalent  queries  against 
the  underlying  relational  schema.  The  retrieved  data 
are  assembled  and  restructured  by  the  front-end 
component  to  reflect  the  semantics  present  in  the  00 
schema.  Examples  of  this  approach  in  which  the  front- 
end  component  has  an  00  flavor  can  is  described  in 
the  implementation  of  the  query  languages  ARIEL  AND 
GEM  [MAC 8 5 , TSU84  ] . This  approach  serves  both  the 
database  designer  and  user.  A similar  approach  has 
been  described  in  [SHI81]  for  the  functional  data 
model  and  its  language.  The  data  modeling 
capabilities  of  the  front-end  component  described  in 
[SHI81]  has  been  limited  to  those  that  can  be  easily 
translated  to  the  underlying  relational 
representation . 

(3)  An  00  data  model  can  be  used  as  the  underlying  model 
of  a complete  DBMS.  This  approach  is  beneficial  to 
the  three  categories  of  users  cited  above.  Several 
prototype  as  well  as  commercial  systems  that  adopt 
this  approach  have  been  described  in  the  literature. 
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These  include  Sembase  [KIN84],  ORION  [ BAN87 ] , IRIS 

[ FIS87 ] , GEMSTONE  [ SER86 ] , and  Vbase  [ONT87]. 

In  order  for  the  second  and  third  of  the  above  three 
approaches  to  be  successful,  we  need,  among  other  things, 
to  define  and  introduce  a new  query  model  that  is 
appropriate  for  00  databases.  In  the  past,  several  query 
languages  have  been  designed  for  the  class  of  00  data 
models  [SHI81,  ZAN83 , TSU84 , MAC85,  SER86 ] (see  Chapter  2 
for  a detailed  survey).  The  advantage  of  these  languages 
over  relational  languages  is  that  they  make  use  of  the 
fact  that  objects  in  an  00  database  are  explicitly 
related  using  their  OIDs  (in  the  relational  data  model, 
tuples  are  implicitly  related  using  matching  key  and 
foreign  key  attribute  values).  Therefore,  users  of  these 
query  languages  are  relieved  of  having  to  provide 
explicit  joins  in  their  queries  in  most  of  the  cases. 

In  these  existing  query  languages,  a query  is  issued 
against  an  object-oriented  database  ( a databases  whose 
structure  is  represented  in  the  form  of  classes  and  their 
associations)  and  it  returns  a relation  (possibly  a non- 
normal form  relation)  as  its  result.  A major  shortcoming 
of  these  existing  languages  is  that  queries  return 
results  that  are  not  structurally  represented  and  modeled 
using  the  same  data  model  with  which  the  original 
databases  is  represented.  In  other  words,  the  closure 


property  (defined  in  Chapter  3)  is  not  preserved  by  these 
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languages.  Therefore,  the  result  of  a query  can  not  be 
further  uniformly  operated  on  (i.e.,  using  the  constructs 
of  the  same  query  language)  to  produce  other  results  that 
satisfy  additional  qualification  conditions. 

In  this  dissertation,  we  introduce  a new  query  model 
for  00  databases,  which  maintains  the  closure  property 
because  the  result  of  a query  in  this  query  model  is  not 
a relation  but  rather  a subdatabase  (defined  in  Chapter 
3)  that  is  represented  using  the  same  00  data  model  with 
which  the  'base"  data  is  modeled.  Consequently,  a 
subdatabase  can  be  further  uniformly  manipulated  (i.e., 
just  like  the  original  database)  to  produce  other 
subdatabases  that  satisfy  some  additional  restriction 
conditions.  Furthermore,  based  on  our  query  model,  we 
introduce  an  object-oriented  query  language  named  OQL 
that  can  be  used  for  querying  an  00  database  (Chapter  4). 

We  also  introduce  a knowledge  definition  language 
that  can  be  used  for  defining  deductive  rules  (Chapter  5) 
as  well  as  integrity  constraints  (Chapter  6)  pertaining 
to  an  00  database.  The  OQL  language  and  the  knowledge 
definition  language  are  tightly  coupled  in  the  sense  that 
the  OQL  constructs  can  be  used  in  the  definition  of  the 
rules  (since  the  definition  of  a rule  may  involve 
database  retrieval  operations  to  check  whether  objects  in 
the  database  satisfy  some  given  conditions  or  not)  and 
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the  data  derived  by  deductive  rules  can  be  manipulated 
using  OQL  just  as  the  base  data. 

The  tight  coupling  of  these  two  languages  (i.e.,  OQL 
and  the  knowledge  definition  language)  facilitates  the 
implementation  of  an  00  knowledge  base  system  (OKBMS) 
based  on  this  language  that  have  the  deductive  reasoning 
capabilities  that  are  typically  found  in  expert  systems 
as  well  as  the  functionalities  normally  found  in  database 
management  systems  (DBMS). 

Deductive  reasoning  about  a large  number  of  objects 
stored  in  a database  is  a functionality  that  is  needed  in 
several  application  domains  (e.g.,  CAD/CAM  databases). 
Motivated  by  this  need,  several  approaches  for 
integrating  Expert  Systems  and  DBMSs  technologies  have 
been  identified  in  the  literature  [RAS88] . The  "interface 
approach"  [ JAR84  , CHA84 ] builds  a bridge  between  an 
existing  relational  DBMS  and  a Prolog-based  inference 
engine.  The  "extension  approach"  [SCI84,  ST083,  ST087], 
on  the  other  hand,  extends  either  an  existing  DBMS  with 
deductive  power  or  an  existing  Prolog— based  system  with 
database  facilities.  These  approaches  suffer  from  two 
major  disadvantages:  (1)  poor  operational  performance, 
and  (2)  the  capabilities  of  the  resulting  system  are 
bound  by  the  limitations  inherent  in  the  existing  systems 
to  be  integrated  or  extended. 
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A third  approach,  called  the  "integrated  approach"  is 
proposed  by  other  authors  [MIS84 , SU85,  POT86,  RAS88] . In 
this  approach,  a knowledge  base  management  system  that 
encapsulates  both  the  database  management  systems 
capabilities  as  well  as  the  deductive  reasoning 
capabilities  of  expert  systems  is  built  from  scratch. 
This  approach  does  not  suffer  from  the  disadvantages  of 
the  above  two  approaches.  We  believe  that  an  important 
cornerstone  in  such  an  integrated  00  knowledge  base 
system  is  a language  that  not  only  supports  the  database 
retrieval  and  storage  operations  but  also  allows  for 
defining  deductive  rules  and  integrity  constraints. 

This  dissertation  is  organized  as  follows.  After  this 
introduction  we  survey  the  related  work  on  query 
languages,  derived  data,  and  constraint  specification  in 
some  of  the  existing  data  models  and  systems  (database 
systems  and  artificial  intelligence  systems).  In  Chapter 
3,  we  introduce  a new  query  model  for  00  databases  that 
maintains  the  closure  property.  The  object-oriented  query 
language  is  described  in  Chapter  4.  In  Chapter  5,  we 
study  the  semantics  of  deductive  rules  in  00  databases 
and  introduce  the  component  of  the  knowledge  definition 
language  that  can  be  used  for  defining  these  rules,  in 
Chapter  6,  we  describe  the  types  of  integrity  constraints 
that  need  to  be  specified  for  00  databases  and  describe 
the  component  of  the  knowledge  definition  language  that 
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can  be  used  for  defining  such  constraints.  The 
implementation  techniques  of  an  OKBMS  are  presented  in 
Chapter  7.  Some  conclusions  and  possible  future 
extensions  to  this  work  are  highlighted  in  Chapter  8.  In 
Appendix  A,  we  give  the  BNF  grammar  for  both  the  OQL  and 
the  knowledge  definition  languages.  Finally,  the  BNF 
grammar  for  defining  a knowledge  base  schema  using  the  00 
semantic  association  model  OSAM*  is  given  in  Appendix  B. 


CHAPTER  2 

SURVEY  OF  RELATED  WORK 


In  this  chapter,  we  survey  some  of  the  existing  work 
related  to  the  languages  introduced  in  this  dissertation. 
In  Section  2.1,  we  describe  the  existing  query  languages 
that  have  been  designed  for  the  class  of  00  data  models. 
In  Section  2.2,  we  survey  work  on  deduction  inference  in 
relational  databases,  artificial  intelligence  systems, 
and  00  data  models.  Some  of  the  existing  constraint 
Specification  languages  are  briefly  described  in  Section 
2.3. 


2.1  Existing  00  Query  Languages 

In  this  section,  we  survey  a representative  sample 
of  the  existing  query  languages  for  the  class  of  00  and 
semantic  data  models.  The  goal  of  this  survey  is  to  show 
that  the  existing  languages  do  not  maintain  the  closure 
property  (as  described  in  Chapter  1).  The  input  to  a 
query  in  these  languages  has  an  00  representation  in  the 
form  of  classes  and  their  associations  while  the  output 
is  simply  a relation. 

Most  of  these  languages  have  emerged  as  an  extension 
to  the  capabilities  of  the  relational  language  QUEL 
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[ Z AN  8 3 , TSU84,  ST084  ].  In  QUEL,  the  "dot"  is  used  to 
qualify  an  attribute  by  its  relation  name.  For  example, 
Rl.A  refers  to  attribute  A that  belongs  to  relation  Rl. 
In  the  approach  described  by  Stonebreaker , et  al.  , 
[ST084],  QUEL  is  used  as  a data  type,  which  facilitates 
referencing  one  relation  from  another  using  the  dot 
notation  such  as  in  the  expression  "X.Y.Z."  In  this 
expression,  X is  a relation,  Y is  a field  in  relation  X 
of  data  type  QUEL.  The  query  stored  in  Y retrieves 
attributes  from  another  relation,  say  Y',  that  is 
semantically  related  to  X.  One  of  the  attributes  that  the 
stored  QUEL  query  retrieves  is  Z.  Thus,  the  expression 
X.Y.Z  returns  the  values  of  the  Z attribute  in  the  Y' 
relation  for  the  Y'  tuples  that  are  related  (as 
determined  by  the  stored  QUEL  query)  to  some  tuples  of 
the  relation  X.  The  tuples  of  the  relation  X can  be 
qualified  in  the  Where  clause  by  conditions  that  also  use 
the  extended  dot  notation  in  the  same  way,  such  as  "X.J.K 
> 5"  . 

The  language  GEM  [ ZAN83 , TSU84  ] is  based  on  an  00 
data  model  (a  model  that  at  least  supports  aggregation, 
generalization,  and  unique  identification  of  objects).  In 
GEM  the  dot  notation  is  used  for  navigation  along  the 
aggregation  links  that  connect  classes  (this  is  similar 
to  the  approach  described  above  [ST084]). 
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Figure  2.1  (figures  of  each  chapter  are  shown  at  its 
end)  shows  a simple  schema  in  which  classes  are 
represented  as  nodes  and  attributes  as  links.  We  shall 
give  some  example  queries  written  in  GEM  and  the  other 
query  languages  in  order  to  demonstrate  their 
similarities  and  differences. 

The  following  is  a GEM  query,  which,  for  each 
section  that  is  taught  by  an  instructor  who  is  employed 
by  a department  that  belongs  to  the  college  of 
engineering,  retrieves  the  name  of  the  instructor  of  the 
section,  the  ss#  of  the  student(s)  of  the  section,  and 
the  c#  of  the  course  of  the  section. 

RETRIEVE  Section . Instructor . name , 

Section . Student . ss# , 

Section . Course . c# 

WHERE  Section . Instructor . Department . College  = 
"Engineering" 

The  central  class  in  this  query  (i.e.,  the  class  at 
which  each  of  the  expressions  in  the  RETRIEVE  and  WHERE 
clauses  start)  is  Section.  This  query  returns  a set  of 
tuples  where  each  tuple  contains  values  for  the 
attributes  Name,  ss#,  and  c#  that  are  related  to  a single 
section . 

ARIEL  [MAC85 ] is  very  similar  to  GEM  but  it  uses  the 
'OF'  notation  in  addition  to  the  dot  notation.  Using  the 
'OF'  notation,  an  expression  such  as  "Employee . name"  is 
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expressed  as  "name  OF  Employee",  which,  linguistically, 
provides  a more  natural  representation.  Thus,  the  above 
query  can  be  expressed  in  ARIEL  as  follows. 

RETRIEVE  name  OF  Instructor  OF  Section, 
ss#  OF  Student  OF  Section, 
c#  OF  Course  OF  Section 

WHERE  College  OF  Department  OF  Instructor  OF  Section 
= "Engineering" 

The  data  retrieval  component  of  the  functional  data 
language  DAPLEX  [SHI81] , though  developed  independently, 
is  similar  to  the  languages  described  above.  In  the 
functional  paradigm,  a link  (i.e.,  an  attribute) 
emanating  from  a class  is  interpreted  as  a function, 
which  when  applied  to  an  object  from  that  class,  returns 
the  related  object  from  the  class  pointed  at  by  the  link. 
For  example,  the  above  query  can  be  expressed  in  DAPLEX 
as  follows  (assuming  that  if  a link  is  not  explicitly 
named,  it  shall,  by  default,  have  the  name  of  the  class 
to  which  it  points). 

FOR  EACH  Section 

SUCH  THAT  College  (Department  (Instructor 
(Section)))  = "Engineering" 

PRINT  Name  (Instructor  (Section)), 
ss#  (Student  (Section)), 
c#  (Course  (Section)) 
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In  the  expression  "Name  (Instructor  (Section))", 
Instructor  is  a function,  which  when  applied  to  an 
instance  of  the  class  Section,  returns  the  related 
instance  of  the  class  Instructor.  The  function  Name  is 
then  applied  to  this  Instructor  instance  to  return  the 
corresponding  name.  Though  DAPLEX  is  based  on  the 
functional  paradigm,  the  structure  of  the  returned  data 
is  a relation  just  as  in  GEM. 

The  query  language  introduced  in  [BAN88]  is  based  on 
the  message  passing  paradigm.  In  this  language,  the  name 
a link  emanating  from  a class  is  interpreted  as  the 
name  of  a message  that  the  class  recognizes  (one  can 
assume  that  there  is  actually  a message  that  is  created 
by  the  system  and  stored  with  the  class  and  that  has  the 
same  name  as  the  attribute).  When  such  a message  is  sent 
to  an  instance  of  the  class  it  returns  the  value  of  the 
attribute . For  example,  the  following  is  an  expression  to 
select  sections  that  are  taught  by  instructors  who  are 
employed  by  any  department  of  the  college  of  engineering 
(by  default,  a link  has  the  same  name  as  the  class  it 
points  to  unless  it  is  explicitly  named) . 

(Section  SELECT  :S  (:S  Instructor  Department  College 

= "Engineering)) 

Ift  this  expression,  SELECT  is  a message  sent  to  the 
class  Section.  The  argument  :S,  which  is  the  first 
argument  of  Select,  is  the  name  of  an  iteration  variable. 
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The  select  message  iterates  over  the  instances  of  the 
class  Section  and  :S  is  bound  to  one  instance  at  a time. 
The  second  argument  of  Select,  which  is  enclosed  in 
parentheses,  is  a block  of  code  that  is  executed  for  each 
value  bound  to  :S.  In  this  block,  the  message  Instructor 
is  sent  to  the  instance  bound  to  :S  to  return  the  related 
Instructor  instance.  Similarly,  Department  and  College 
are  messages  (Department  is  sent  to  the  returned 
Instructor  instance  and  College  is  sent  to  the  returned 
Department  instance).  The  message  "="  is  then  sent  to  the 
resulting  college  instance  (a  college  instance  here  is  a 
string  of  characters  representing  the  name  of  the 
college)  with  the  argument  "Engineering."  The  result  is 
the  logical  object  TRUE  or  FALSE.  The  logical  AND  or  OR 
message  can  be  sent  to  this  object  with  an  argument  that 
specifies  some  other  condition  on  the  instances  of 
Section  (these  conditions  also  have  to  start  from  Section 
and  branch  from  there  using  messages).  In  principle, 
though  not  described  by  Banerjee,  et  al.  [ BAN88 ] , similar 
message-based  expressions  can  be  used  to  retrieve 
attribute  values  that  are  related  to  these  Section 
instances.  The  result  of  a query  that  may  involve  such 
conditions  is  the  set  of  instances  of  Section  that 
satisfy  the  specified  conditions. 

As  shown  in  the  above  query  languages,  a query 
operates  on  a database  that  is  structurally  represented 
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using  an  00  data  model  and  returns  a result  whose 
structure  is  represented  in  the  form  of  a table. 
Therefore,  the  result  of  a query  cannot  be  further 
uniformly  manipulated  by  other  queries  to  produce  new 
results.  In  Chapter  3,  we  introduce  a query  model  for  00 
databases  in  which  a query  returns  a result  that  is 
structurally  represented  using  the  constructs  of  an  00 
data  model. 


2.2  Deductive  Databases 

A deductive  database  is  defined  by  Gallaire,  et  al. 
[GAL84]  as  "a  database  in  which  new  facts  may  be  derived 
from  facts  that  were  explicitly  introduced."  Data 
deduction  as  defined  above  is  supported  in  three 
different  research  areas,  namely,  deductive  relational 
databases,  rule  systems  in  artificial  intelligence,  and 
00  and  semantic  data  models.  In  this  section,  we  survey 
only  some  representative  samples  of  work  that  has  been 
done  in  each  of  these  three  areas  (a  complete  survey 
would  be  longer  than  this  entire  dissertation). 

2.2.1  Deductive  Relational  Databases 

A substantial  portion  of  research  in  the  literature 
has  focused  on  extending  relational  databases  with 
logical  ( PROLOG-like ) inferencing  capabilities  [GAL78, 
CHA84,  VAS84,  GAL84,  ULL85,  MAI87].  In  this  approach, 
data  are  defined  and  stored  in  the  system  in  the  form  of 
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base  relations.  Inference  rules  are  defined  as  a logic 
program  against  these  relations.  User  queries  are 
transformed  based  on  these  rules  into  queries  against 
base  relations.  This  approach  is  possible  because  a 
relational  database  can  be  considered,  from  logical  point 
of  view,  as  a special  first-order  theory. 

In  [ULL85]  , deductive  rules  are  represented  as 
database  view  definitions  in  the  form  of  Horn-clauses. 
The  following  is  an  example  database  and  some  deductive 
rules  as  presented  in  [ULL85] . The  database  used  consists 
of  the  two  relations  EDS  (E,  D,  S),  which  represents 
employees,  their  departments,  and  salaries  and  DM  (D,  M) , 
which  represents  departments  and  their  managers.  The 
following  two  rules  define  a "secure"  version  of  the  EDS 
relation,  in  which  salaries  that  are  more  than  100K  are 
replaced  by  0. 

SecureEDS  (e,  d,  s)  EDS  (e,  d,  s),  s <=  100,000 

SecureEDS  (e,  d,  0)  :-  EDS  (e,  d,  s),  s > 100,1000 

The  first  rule  states  that  the  tuple  <e,d,s>  appears  in 

SecureEDS  if  it  is  in  EDS  and  s is  less  than  or  equal  to 

100K,  while  the  second  rule  states  that  the  tuple  <e,d,0> 
appears  in  SecureEDS  if  it  is  in  EDS  and  the  value  of  s 
is  greater  than  100K. 

The  following  rule  against  the  same  database  defines 
a view  that  relates  employees,  their  salaries,  and  their 
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managers,  where  the  manager  of  an  employee  is  any  manager 
of  any  department  the  employee  works  for. 

ViewESM  (e,  s,  m)  SecureEDS  (e,  d,  s),  DM  (d,  m) 
The  query  "Retrieve  all  the  employees  that  work  for  Mike 
and  their  salaries"  can  be  expressed  as 

ViewESM  (e,  s,  "Mike"). 

The  coupling  approach,  in  which  an  interface  is 
built  between  a relational  DBMS  and  logic-based  systems, 
has  been  adopted  by  other  researchers.  For  example,  the 
approach  taken  by  Chang  and  Walker  [CHA84]  is  that  of 
coupling  a PROLOG  system  with  the  relational  database 
system  SQL/DS  in  which  assertions  (facts)  are  stored.  An 
interface  to  SQL/DS  called  PROSQL  is  built  based  on 
PROLOG  and  is  used  to  query  the  underlying  SQL/DS  system 
in  addition  to  performing  the  conventional  PROLOG 
inferencing  capabilities.  A PROSQL  program  is  a PROLOG 
program  that  uses  a special  predicate  called  SQL  in 
addition  to  the  normal  PROLOG  syntax.  The  argument  of 
this  predicate  represents  a call,  written  in  the  query 
language  SQL,  to  the  underlying  SQL/DS  system.  Depending 
on  the  SQL  query  passed  to  it,  SQL/DS  may  create  tables 
or  views,  insert  tuples  into  SQL/DS,  or  retrieve  tuples 
from  SQL/DS.  In  case  of  retrieval,  the  retrieved  tuples 
are  deposited  in  the  PROLOG  work  space  as  assertions 
about  the  predicate  that  has  the  same  name  as  the  table 
referenced  in  the  FROM  clause  of  the  passed  query.  For 
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example,  if  the  relation  Q (A,B)  that  has  the  extension 
[(al,bl),  (a2,b2)}  is  stored  in  SQL/DS,  then,  if  the 

following  SQL  predicate  appears  as  part  of  a PROSQL 
program, 

SQL  (’SELECT  A,  B 
FROM  Q ’ ) 

it  will  cause  the  following  two  assertions  to  be  inserted 
in  the  PROLOG  work  space: 

Q (al,  bl) 

Q ( a2 , b2 ) 

These  assertions  are  then  treated  normally  in  the  rest  of 
the  PROLOG  program. 

As  an  example,  let  R (B,  C)  be  another  relation,  in 
addition  to  Q (A,  B),  that  is  stored  in  SQL/DS.  The 
following  PROSQL  program  defines  a predicate  T which 
relates  attribute  A of  relation  Q with  attribute  C of 
relation  R and  returns  the  resulting  (A,  C)  tuples. 

SQL  (’SELECT  A,  B FROM  Q’)  ? 

SQL  ('SELECT  B,  C FROM  R1)  ? 

T ( A , C)  <—  Q (A,  B),  R(A,  C) 

T( A,  C>? 

By  making  use  of  the  CREATE  VIEW  command  of  SQL,  the 
above  program  can  alternatively  be  written  as  follows: 

SQL  ('CREATE  VIEW  T (A,  C)  AS 
SELECT  A,  C 


FROM  Q,  R 
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WHERE  Q.B  = R.B' ) ? 

SQL  ('SELECT  A,  C FROM  T')  ? 

T (A,  C)  ? 

This  version  of  the  program  is  more  efficient  than  the 
first  version  if  there  are  too  many  tuples  for  tables  Q 
and  R because  it  makes  use  of  the  query  optimization 
facility  of  the  underlying  relational  system.  Also,  in 
case  of  processing  many  tuples,  the  size  of  the  PROLOG 
work  space  may  not  be  sufficient  to  hold  these  tuples.  We 
note  that  the  CREATE  VIEW  facility  of  SQL/DS  can  be  used 
only  with  non-recursive  PROLOG  rules.  Whang  and  Navathe 
[WHA87]  have  proposed  a decomposition  of  PROLOG  requests 
into  "disjunctive  normal  forms"  expressions  so  that 
accesses  to  the  relational  database  can  be  minimized. 

A similar  system  called  "optimizing  PROLOG  front-end 
to  a relational  query  system"  is  described  by  Jark,  et 
al.  [ JAR  8 4 ] . This  system  allows  for  the  efficient 
exchange  of  queries  and/or  data  between  a PROLOG-based 
expert  system  and  a relational  database  system.  In  this 
approach,  Jark,  et  al.,  introduced  an  intermediate 
language  called  DBCL  between  PROLOG  and  the  back-end 
relational  system.  PROLOG  data  requests  are  first 
translated  to  DBCL  statements.  Syntactic  and  semantic 
optimization  of  such  DBCL  statements  are  then  performed. 
The  resulting  optimized  DBCL  statements  are  independent 
of  the  query  language  of  the  underlying  relational 


22 


system,  which  facilitates  the  portability  of  the 
optimizing  PROLOG  front-end.  These  DBCL  statements  are 
then  translated  into  queries  in  the  query  language  of  the 
target  relational  system  (e.g.,  SQL).  The  retrieved  data 
are  loaded  into  the  PROLOG  work  space  and  treated 
normally  in  PROLOG  programs. 

In  a different  approach,  Mainderville  and  Simon 
[MAI87]  integrated  a production  rule  language  called  RDL1 
with  a relational  DBMS.  A production  rule  in  RDL1 
consists  of  a conditional  part  and  an  action  part  and  has 

the  form:  Condition > Action.  A condition  is  a well- 

formed  formula  of  the  tuple  relational  calculus.  An 
action  can  have  one  of  the  following  forms,  assuming  that 
R (Al,  A2 , ...,  An)  is  a relation  with  attributes  A1  to 

An. 

(1)  +R  (Al  = tl,  A2  = t2,  ...,  An  = tn)  is  an  action 

that  adds  a tuple  to  relation  R.  (The  ts  are 

attribute  values.) 

(2)  -R  (Al  = tl,  A2  = t2,  ...,  An  = tn)  is  an  action 

that  deletes  a tuple  from  relation  R. 

(3)  If  x is  a valid  tuple  variable,  then  +R  (x)  and  -R 
(x)  are  also  valid  actions. 

In  RDLl,  a rule  program  is  defined  to  be  a finite 
set  of  rules.  Rules  are  specified  using  a layered 
methodology.  A first  layer  of  rules  defines  a set  of 
target  (derived)  relations  which  are  directly  derived 
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from  a set  of  base  relations.  The  second  layer  of  rules 
uses  the  relations  derived  by  the  first  layer  rules  in 
addition  to  the  set  of  base  relations  as  its  source 
relations.  This  process  is  iterated  to  higher  layers  of 
rules . 

A rule  module  is  defined  as  a tuple  <S0,  T , RU> 
where  SO  is  a set  of  source  relations,  T is  a set  of 
target  relations,  and  RU  is  a set  of  rules  that  satisfy 
the  following  conditions: 

(1)  Each  relation  in  T is  defined  only  in  this  module, 
i.e.,  it  does  not  appear  in  the  right  hand  side  of  a 
rule  belonging  to  another  module. 

(2)  A rule  can  only  delete  tuples  from  a relation  iff 
this  relation  has  been  previously  defined  within  the 
same  module. 

For  example,  let  Parents  (Parent,  Child)  be  a base 
relation  in  some  database.  The  following  rule  module 
derives  the  target  relation  Ancestor  (Asc,  Desc): 

MODULE  Ancestor: 

TARGET 

Ancestor  (Asc : text,  Desc : Text) 

Parent  (x)  and  Ancestor  (y)  and 


RULES 


"x  and  y are  tuple  variables  defined 
over  relations  Parent  and  Ancestor, 
respectively" 
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x. Child  - y.Asc  — > +Ancestor  (Asc  = x. Parent, 

Desc  = y.Desc); 

Parent  (x)  — > +Ancestor  (x); 

END. 

In  RDL1 , rules  do  not  have  to  be  ordered  by  the 
programmer  for  expressing  the  way  to  solve  a problem  or 
to  derive  data.  Also,  the  order  of  the  predicates  in  the 
conditional  part  of  a rule  has  no  effect  on  the  result. 
Mainderville  and  Simon  [MAI87]  describe  a set-oriented 
execution  strategy,  which  enables  the  result  of  a rule 
program  to  be  independent  of  the  order  of  rules  or 
predicates . 

The  above  approaches  for  augmenting  relational 
databases  with  deductive  capabilities  make  use  of  the 
fact  that  the  relational  query  model  satisfies  the 
closure  property  in  the  sense  that  an  expression  in  the 
relational  algebra,  calculus,  or  logic  (when  a database 
is  viewed  as  first-order  theory)  against  a relational 
database  always  returns  a relation  as  its  result.  Such  a 
relation  can  further  be  used  to  derive  other  relations, 
and  so  forth.  The  component  of  the  knowledge  definition 
language  for  defining  deductive  rules  to  be  described  in 
Chapter  _>  is  similar  in  that  it  makes  use  of  the  closure 
property  defined  in  our  query  model  for  00  databases  (see 
Chapter  3 ) . 
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2.2.2  Inference  in  Artificial  Intelligence  Systems 

Several  A. I.  systems  that  support  inference  and 
deductive  reasoning  have  been  introduced  in  the  past  few 
years.  In  this  section,  we  survey  some  of  these  systems. 

KL-ONE  [BRA85]  provides  a language  for  the  explicit 
representation  of  knowledge  in  Artificial  Intelligence 
systems.  In  the  KL-ONE  kernel,  the  concentration  is  on 
the  static  structure  and  interrelations  between  objects. 
KL  ONE  representation  is  similar  to  that  of  semantic  and 
00  data  models  found  in  the  database  area.  In  the 
following,  we  show  a list  of  KL-ONE  terms  and  their 
corresponding  database  terms: 


KL-ONE 

concept 

individual  concept 
subsuming  concept 
role 

role  filler 
superClink 

taxonomy 

value  restriction  (V/R) 


00  data  models 
class 

instance  of  a class 
super  class 

attribute  (aggregation 
association ) 
attribute  value 
generalization 
association 

schema  (semantic  diagram) 
attribute  type  (domain  of 
an  attribute) 
cardinality  constraints 


number  restriction  (N/R) 
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Inferencing  in  KL-ONE  is  performed  in  two  different 
ways.  First,  the  KL-ONE  implementation  provides,  at 
retrieval  time,  inherited  information  about  a Role  or  a 
Concept  based  on  the  SuperClinks  that  exist  between 
concepts  in  the  taxonomy.  Second,  KL-ONE  provides  a 
classification  mechanism,  which  discovers  for  a newly 
added  concept  to  a taxonomy,  cases  of  subsumption  not 
readable  from  the  concept  specification  by  simple  means. 
For  these  cases,  the  classifier  adds  the  appropriate 
SuperClinks  to  the  taxonomy.  This  facilitates  the  dynamic 
creation  of  concepts  during  the  execution  of  some  task. 

The  Knowledge  Engineering  Environment  KEE  [FIK85], 
is  a system  that  integrates  a production  rules  language 
and  a frame-based  language  to  form  a hybrid 
representation  facility  that  combines  the  advantages  of 
both  representation  schemes.  Frames  provide  the 
structural  representation  of  an  individual  object  or  of  a 
class  of  objects.  The  slots  of  a frame  represent  its 
attributes  (i.e.,  in  database  terminology).  Frames  can  be 
connected  by  IS-A  (i.e.,  generalization)  links,  thus 
forming  a taxonomy  similar  to  KL-ONE's  taxonomy. 
Inference  based  on  the  inheritance  property  of  an  IS-A 
hierarchy  is  supported  in  KEE.  Constraints  are  used  by 
other  inference  methods  to  determine  whether  a given 
object  can  be  a legitimate  member  of  a frame  or  a value 


of  a slot. 
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In  KEE,  frames  are  also  used  to  represent  production 
rules.  A rule  defined  as  a frame  will  have  slots  that 
describe  its  attributes  (e.g.,  conditions,  conclusions, 
actions,  etc.).  The  predicates  of  the  KEE  production-rule 
language  reflect  the  relationships  that  can  be 
represented  in  the  frame  language.  For  example,  the 
predicate  IN. CLASS  takes  two  arguments:  an  object  name 
and  a class  name  and  returns  TRUE  if  the  object  belongs 
to  the  class  and  FALSE  if  otherwise.  Rules  may  derive 
frames  that  represent  subclasses  of  existing  classes  in  a 
way  similar  to  the  technique  used  by  the  SDM  data  model 
to  be  described  in  the  following  subsection . 

0PS5  is  a production  system  that  was  developed  at 
Carnegie-Mellon  University  [F0R81] . in  0PS5,  productions 
(i.e.,  rules)  are  stored  in  a memory  called  "production 
memory."  These  productions  operate  on  a nonper si stent 
database  (i.e.,  a database  that  is  not  preserved  after 
the  execution  of  a rule  program)  called  "working  memory." 
Every  element  in  working  memory  consists  of  a relation 
name  followed  by  a list  of  associated  attribute-value 
pairs  that  describe  one  object.  For  example,  the 
following  is  the  description  of  an  object  named  BlocklO 
that  belongs  to  the  relation  Block  (Attribute  names  are 
preceded  by  the  sign): 

( block 

^name  blocklO 

~color  red 

"'mass  600 
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^length  50 
~width  50 
'height  50) 

The  condition  part  of  a production,  i.e.,  the  left- 
hand  side  (LHS)  may  contain  several  conditions. 
Similarly,  the  action  part,  i.e.,  right-hand  side  (RHS) 
may  contain  more  than  one  action.  Each  condition  in  the 
LHS  is  a pattern  that  describes  working  memory  elements. 
During  the  execution  of  a rule,  the  interpreter  compares 
each  pattern  of  the  LHS  with  the  working  memory  elements 
to  determine  if  the  pattern  matches  any  of  them.  For 
example,  the  pattern  (block  ‘'color  red)  will  match  the 
working  memory  element  described  above. 


If  a variable  occurs  more  than  once  in  a rule,  all 
occurrences  of  the  variable  must  be  bound  to  the  same 
value.  For  example,  (block  'length  <x>  'width  <x>  'height 
<x>)  in  which  there  are  three  occurrences  of  the  variable 
X will  match  any  memory  element  which  is  a block  and 
which  has  equal  length,  width,  and  height  (i.e.,  a cube). 
The  RHS  of  a production  consists  of  a sequence  of 
actions,  which  can  contain  one  or  more  of  the  following 
actions : 

adds  a new  element  to  working  memory 

9 data  element  from  working  memory 
modifies  some  attributes  of  a data  element 
that  is  identified  by  the  LHS  of  the 
production 


make 

delete 

modify 
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calls  a user-defined  operation 
By  analyzing  the  0PS5  data  representation,  we  find 
that  it  resembles  a relational  representation  but  with 
named  attributes.  The  0PS5  rules  language  is  based  on 
this  representation.  Inferencing  based  on  the  inheritance 
property  of  a generalization  relationship  as  supported  by 
KL-ONE  and  KEE  is  also  supported  in  our  query  model  and 
language  (Chapters  3 and  4).  The  classification  mechanism 
as  supported  by  KL-ONE  is  similar  to  the  notion  of 
schema  modification"  in  databases  and  is  beyond  the 
scope  of  this  dissertation. 

— • 2 • 3 Derived  Data  in  Existing  00  and  Semantic  Models 

The  semantic  data  model  SDM  [HAM81]  supports  derived 
data  in  the  form  of  derived  subclasses  and  derived 
attributes.  A derived  subclass  S can  be  defined  by 
specifying  a predicate  P on  the  instances  of  a class  C. 
Only  the  C instances  that  satisfy  P become  also  instances 
of  S.  The  following  are  the  different  kinds  of  predicates 
that  can  be  specified  in  SDM. 

(1)  A predicate  can  be  specified  on  one  or  more  of  the 
attributes  of  C and  the  instances  of  C whose 
attribute  values  satisfy  this  predicate  become  the 
instances  of  S.  This  type  of  class  is  referred  to  in 
SDM  as  an  "attribute  defined  subclass." 

(2)  A subclass  S can  be  defined  such  that  its  instances 

the  instances  of  C that  also  belong  to  two  other 
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subclasses,  say  SI  and  S2,  of  C.  In  other  words,  the 
instances  of  S are  formed  by  taking  the  intersection 
of  the  two  classes  SI  and  S2.  The  instances  of  S can 
also  be  defined  to  be  the  union  or  difference  of  SI 
and  S 2 . The  class  S is  called  a "set-operator- 
subclass  . " 

(3)  A subclass  S can  be  defined  such  that  it  contains 
the  instances  of  C that  are  currently  values  of  some 
attribute  of  another  class.  This  type  of  class  is 
called  an  "existence  subclass." 

The  following  are  some  of  the  techniques  that  SDM 
provides  for  expressing  the  derivation  of  attribute 
values  from  other  information  in  the  schema. 

(1)  An  attribute  can  be  defined  as  an  ordering 
a^rikute . In  this  case,  the  instances  of  the  class 
are  ordered  based  on  the  values  of  the  ordering 
attribute.  Ordering  is  by  decreasing  or  increasing 
values  as  specified  by  the  schema  designer. 

(2)  The  value  of  an  attribute  can  be  declared  to  be  a 
Boolean  value  (True  or  False)  depending  on  whether 

the  instance  of  the  class  exists  in  another  class  or 
not . 

(3)  The  value  of  an  attribute  can  be  defined  as  the 
combination  of  values  that  are  obtained  by 
recursively  tracing  the  values  of  some  other 
attribute.  For  example,  in  the  schema  of  Figure  3.1 
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an  attribute  All-prereq  of  the  class  Course  can  be 
defined  such  that  its  values  include  all  the  levels 
of  prerequisite  courses  derived  from  the  Prereq 
attribute.  In  other  words,  a value  of  the  attribute 
All-prereq  for  a particular  Course  instance  includes 
the  immediate  prerequisites,  their  prerequisites, 
and  so  on. 

(4)  Values  of  a set-valued  attribute  can  be  defined  as 
the  intersection,  union,  or  difference  of  two  other 
set-valued  attributes. 

(5)  The  value  of  an  attribute  can  be  derived  based  on  an 
arithmetic  expression  that  involves  the  values  of 
other  attributes. 

(6)  The  value  of  an  attribute  can  be  defined  to  be  equal 
to  the  number  of  members  of  some  other  set-valued 
attribute . 

In  the  functional  data  model  and  the  data  language 
DAPLEX  [SHI81] , derived  data  is  expressed  in  the  form  of 
derived  functions.  Derived  functions  are  specified  by 
means  of  DEFINE  statements.  For  example,  the  following  is 
the  definition  of  the  function  Department  over  the 
instances  of  the  class  Course  (Figure  2.1). 

DEFINE  Department  (Course)  > 

Department  (Instructor  (Section  (Course))) 

The  function  Department  of  Course  can  be  used  in  queries 
as  if  it  were  a primitive  function  and  not  a derived  one. 
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The  IN  operator  can  be  used  in  case  when  a function 
takes  two  arguments  of  the  same  type.  The  following  is  a 
function  that  compares  two  students  based  on  their  GPA's 
and  uses  the  IN  operator. 

DEFINE  Brighter  (SI  IN  Student,  S2  IN  Student)  — > 
GPA  (SI)  > GPA  ( S2 ) 

We  note  that  a limited  support  for  derived  data  has 
been  provided  in  the  area  of  00  and  semantic  data  models. 
A full-fledged  deductive  rule-based  language  that  is 
based  on  the  00  modeling  constructs  and  on  the  closure 
property  for  00  databases  is  absent.  In  Chapter  5 of  this 
dissertation,  we  introduce  such  a language. 

2.3  Integrity  Constraints 

Integrity  constraints  are  conditions  that  describe 
the  illegitimate  states  of  a database.  In  order  for  a 
database  to  be  an  accurate  model  for  some  real  world 
application , these  constraints  need  to  be  captured  in  the 
database  and  enforced  by  the  DBMS.  In  [BUN  79], 
constraints  for  the  relational  data  model  have  been 
classified  into  "simple"  and  "complex"  constraints.  A 
simple  constraint  is  one  whose  conditions  are  based  on  a 
single  relation  (can  also  be  called  in  tr  a — rel  a tion 
constraint),  while  a complex  constraint  is  one  whose 
conditions  are  based  on  more  than  one  relation  (inter- 
relation constraint). 
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Constraints  can  also  be  classified  as  "implicit"  or 
explicit."  An  implicit  constraint  is  one  that  is 
inherent  in  the  structural  constructs  of  the  data  model. 
For  example,  the  unique  key  and  domain  constraints  of  the 
relational  data  model.  (The  domain  constraint  means  that 

an  attribute  in  a relation  can  draw  values  only  from  its 
domain . ) 

Several  constraint  specification  languages  have  been 
proposed  for  the  relational  data  model  [ DAT 8 1 , SMI87]. 

Some  other  proposals  have  extended  the  constructs  of  the 
relational  query  languages  (e.g.,  SQL,  QUEL)  to  allow  for 
defining  integrity  constraints  [HEL75,  AST76,  CHA81 ] To 
illustrate  how  constraints  can  be  specified  in  these 
languages,  we  describe  some  constraints  against  the 
following  relational  schema,  which  consists  of  the 
relation  Supplier  (S),  Part  (P),  and  Shipment  (SP).  In 
this  schema  the  attributes  are  self-explanatory  and  the 
key  attributes  are  underlined. 

S (SI,  Sname,  Status,  City) 

P (P£,  Pname , Color,  Weight) 

SP  ( S# , P# , Qty) 

The  following  are  some  constraints  defined  based  on  the 
language  proposed  by  Date  [DAT81 ] . 

Constraint — 1 Values  for  the  attribute  Status  must  be 
positive . 


AFTER  INSERTING  S 
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AFTER  UPDATING  S . Status 
S. Status  > 0 ; 

The  AFTER  clause  specifies  when  such  a constraint  needs 
to  be  checked  and,  similarly,  the  BEFORE  clause  in  the 
following  constraint. 

Constraint — 2 Attribute  S#  is  the  primary  key  of  relation 
S. 

BEFORE  INSERTING  S FROM  NEW_S , 

BEFORE  UPDATING  S.S#  FROM  NEW_S.S#, 

NOT  EXIST  (S  WHERE  S.S#  = NEW_S.S#) 

AND  NOT  IS_NULL  NEW_S.S# 

The  NEW_S  and  NEW_S.S#  refer  to  the  newly  inserted  S 
tuples  or  updated  S.S#  values,  respectively. 

The  referential  integrity  constraint  of  the 
relational  data  model  specifies  that  if  A is  an  attribute 
of  a relation  R1  that  draws  values  from  a domain  D which 
also  supplies  primary  key  values  for  relation  R2,  then  a 
value  of  A in  R1  must  be  either  Null  or  equal  to  a value 
that  has  been  supplied  in  R2.  This  is  demonstrated  by  the 
following  constraint. 

Constraint — 3 Attribute  S#  which  is  an  attribute  of 
relation  SP  is  the  primary  key  of  relation  S.  Therefore, 
for  an  S#  value  to  exist  in  a tuple  of  relation  SP  it 
must  also  exist  in  a tuple  of  relation  S.  The  constraint 
is  expressed  in  Date's  language  as 


SP.S#  — >>  S.S# 


35 


The  above  three  constraints  can  be  specified  in  the 

language  proposed  by  Simon  and  Valduriez  [SIM87]  as 
follows : 

Constraint  1 

J = S "declares  J as  a variable  over 

relation  S" 

ASSERT  ON  J 

J. Status  > 0 
Constraint  2 

UNIQUE  ON  S S# 

NOT  NULL  IN  S S# 

Constraint  3 
X = S 
Y = SP 

REFERENCES  Y.S#  WITH  X.S# 

In  [HEL75],  the  relational  query  language  QUEL  is 
enhanced  with  constructs  that  can  be  used  to  define 
constraints.  For  example,  Constraint  1 above  can  be 
expressed  in  QUEL  as  follows: 

RANGE  OF  X IS  S 
INTEGRITY  X. Status  > 0 

In  the  area  of  00  and  semantic  data  models,  several 
types  of  constraints  have  been  identified.  For  example, 
the  mapping  constraints  between  object  classes  have  been 
incorporated  in  the  entity  relationship  (ER)  data  model 
f CHE7  6 ] and  in  the  OSAM*  data  model  [SU83,  SU88].  Set- 
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intersection,  set  exclusion,  and  set-equality  constraints 
can  be  defined  between  any  pair  of  subclasses  of  a 
generic  class  [SU88] . in  the  semantic  data  model  ( SDM) 
[HAM81] , constraints  on  the  attributes  of  a class  can  be 
specified.  For  example,  an  attribute  can  be  specified  as 
"mandatory,"  which  means  that  a null  value  is  not  allowed 
for  it.  An  attribute  can  also  be  specified  to  be 
exhaustive"  of  its  domain,  which  means  that  every 
instance  in  the  domain  class  must  participate  as  a value 
of  this  attribute.  If  an  attribute  is  defined  as  a set- 
valued attribute,  it  can  be  constrained  to  be 
"nonoverlapping."  This  means  that  no  two  values  of  this 
attribute  can  have  any  instance  in  common. 

Though  several  proposals  for  a constraint 
specification  language  have  been  introduced  for  the 
relational  data  model,  the  area  of  constraint 
specification  for  the  class  of  00  data  models  needs  more 
investigation.  In  this  work,  the  semantics  of  constraints 

in  00  databases  is  studied  and  the  results  are  reported 
in  Chapter  6 . 
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Figure  2.1  A Simple  Schema 


CHAPTER  3 

A CLOSED  QUERY  MODEL 
FOR  OBJECT-ORIENTED  DATABASES 

3 . 1 Introduction 

Several  query  languages  such  as  DAPLEX  [SHI81] , GEM 
[ZAN83,TSU84]  , ARIEL  [MAC85],  and  the  object-oriented 
query  language  described  in  [ BAN 8 8 ] , which  are  based  on 
the  00  view  of  data  as  briefly  described  in  Chapter  1 (or 
variations  of  it),  have  been  introduced  in  the  literature 
(a  detailed  survey  is  provided  in  Chapter  2).  A query  in 
these  languages  is  defined  by  choosing  one  of  the  non- 
primitive classes  in  the  schema  as  a central  class 
(anchor  class).  The  user  then  specifies  some  path 
expressions  such  that  each  path  expression  starts  from 
the  central  class  and  ends  up  at  a primitive  class.  A 
restriction  condition  can  be  specified  on  the  class 
referenced  at  the  end  of  a path  expression  and/or  such  a 
class  can  be  specified  in  the  target  list  (i.e.,  the  list 
of  attributes  to  be  retrieved).  The  result  of  a query  is 
defined  as  a set  of  tuples  (i.e.  , a relation)  each  of 
which  corresponds  to  a single  instance  of  the  central 
class  and  contains  values  related  to  that  instance  which 
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are  collected  from  the  primitive  classes  in  the  target 
list . 

These  languages  share  the  above  characteristics  even 
though  they  are  based  on  different  paradigms.  For 
example,  DAPLEX  is  based  on  the  functional  paradigm.  An 
attribute  of  a class  in  DAPLEX  is  treated  as  a function, 
which  when  applied  to  an  object  of  the  class,  returns  its 
attribute  value  from  the  related  class.  Path  expressions 
starting  from  a central  class  can  be  specified  by 
composing  functions.  The  query  language  of  [ BAN 8 8]  , on 
the  other  hand,  is  based  on  the  message  passing  paradigm. 
In  this  language,  an  attribute  of  a class  is  treated  as  a 
message,  which  when  sent  to  an  object  of  the  class, 
returns  its  attribute  value  from  the  related  class.  Path 
expressions  starting  from  a central  class  can  be 
specified  using  a series  of  messages  each  of  which  is 

sent  to  the  object  returned  by  the  previous  message  in 
the  series. 

A major  drawback  of  existing  data  manipulation 
languages  that  have  been  designed  for  the  class  of  00 
data  models  is  that  the  result  of  a query  is  not 
represented  using  the  same  data  model  with  which  the 
original  database  is  modeled.  In  these  languages,  the 
input  to  a query  has  an  00  representation  (i.e.,  classes 
and  their  associations)  and  its  output  is  a relation  (a 
table  of  attribute  values).  In  other  words,  the  closure 
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property  is  not  maintained  by  these  languages. 
Consequently,  the  result  of  a query  cannot  be  further 
operated  on  uniformly  using  the  same  query  language 
operators  to  produce  new  results. 

Motivated  by  this  limitation  of  the  existing  00 
query  languages,  we  introduce  in  this  chapter  a query 
model  for  00  databases  that  maintains  the  closure 
property.  A query  in  our  query  model  returns  a 
subdatabase  whose  structure  consists  of  some  selected 
classes  of  objects  and  their  aggregation  and 
generalization  associations,  i.e.,  it  has  the  same 
structural  characteristics  as  those  of  the  original 
database.  For  this  reason,  a resulting  subdatabase  can  be 
further  operated  on  by  another  query  in  a uniform  way  to 
produce  another  subdatabase  or  it  can  be  saved  as  a view 
definition.  The  instances  (objects)  that  satisfy  the 
search  conditions  and  participate  in  the  patterns  of 
ob~) ec t — associations  specified  in  the  query  form  the 
extension  of  the  resulting  subdatabase.  Furthermore,  we 
make  our  query  model  concrete  by  introducing  the  00  query 

language  0QL  (Chapter  4)  as  an  example  of  this  query 
model . 

3.2  Definition  of  a Query  Model 

A query  model  is  an  abstraction  of  a set  of  query 
languages  that  share 


some  common  properties.  Different 
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query  languages  that  have  different  syntax,  flavor,  and 
degrees  of  user-friendliness  can  be  designed  based  on  a 
given  query  model.  We  define  a query  model  in  terms  of 
the  three  properties  given  below.  Based  on  these 
properties,  the  relational  languages  SQL,  QBE,  QUEL,  and 
the  relational  algebra  can  be  classified  under  one  query 
model,  which  we  refer  to  as  the  relational  query  model. 
Property — the  structural  constructs  that  are  to  be 
manipulated,  i.e.,  the  kinds  of  structural  constructs 
that  are  provided  by  the  data  model  for  which  the  query 
model  is  defined.  The  structure  that  query  languages  of 
the  relational  query  model  are  designed  to  manipulate  is 
the  relation,  which  is  the  only  structure  provided  by  the 
relational  data  model.  The  input  to  a relational  query 
can  be  a single  relation  or  a set  of  relations. 

LroPerty — 2±  the  structural  constructs  with  which  the 
output  of  a query  can  be  represented.  For  example,  the 
structure  of  the  output  of  a query  in  the  relational 
query  model  is  always  a relation. 

Property  3:  The  kinds  of  operators  that  are  needed  in  a 

query  language  that  belongs  to  the  query  model  such  that 
if  any  of  these  operators  is  applied  to  an  input  that 
satisfies  property  1 (i.e.,  modeled  using  some  specified 
data  model)  will  produce  an  output  that  satisfies 
property  2 (i.e.,  it  produces  an  output  with  the  desired 
structure).  The  kinds  of  operators  supported  by  languages 
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of  the  relational  query  model  are:  the  target  list 
specification  (which  corresponds  to  Projection  in  the 
relational  algebra)  and  the  specification  of  predicates 
on  attributes  (which  corresponds  to  Join  and  Restriction 
in  the  relational  algebra). 

We  define  a "closed"  query  model  as  a query  model  in 
which  the  structure  of  the  output  of  a query  (Property  2) 
is  represented  using  the  same  data  model  with  which  the 
input  of  the  query  (Property  1)  is  represented, 
otherwise,  we  refer  to  it  as  a "hybrid"  query  model. 
According  to  this  definition,  the  relational  query  model 
is  closed  since  the  input  to  a query  as  well  as  its 
output  are  modeled  (i.e.,  structurally  represented)  using 
the  relational  data  model. 

Besides  its  conceptual  naturalness,  maintaining  the 
closure  property  in  a query  model  has  several  advantages. 
For  example,  the  result  of  a query,  since  it  is  modeled 
by  the  same  data  model,  can  be  uniformly  operated  on  by 
another  query  (i.e.,  using  the  operators  of  the  same 
query  language)  to  further  produce  a new  result.  The 
result  of  a query  can  also  be  saved  as  a view  definition 
and  manipulated  uniformly  as  the  original  database. 

Existing  query  languages  that  have  been  introduced 
so  far  for  the  class  of  00  data  models  (see  the  survey 
provided  in  Chapter  2)  do  not  satisfy  the  closure 
property  and  therefore  they  belong  to  the  hybrid  query 
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model.  A query  in  these  languages  operates  on  a database 
that  has  an  00  representation  and  produces  a relation.  In 
the  following  section,  we  introduce  a new  query  model  for 
00  databases  that  satisfies  the  closure  property. 

3 . 3 Our  00  Query  Model 

We  define  our  query  model  by  defining  for  it  each  of 
the  three  properties  described  above.  In  Section  3.3.1, 
we  first  describe  the  00  view  of  an  application  world  as 
can  be  represented  by  an  00  data  model.  This  represents 
the  input  to  a query  in  our  query  model  and  corresponds 
to  the  first  of  the  three  properties.  Section  3.3.2 
describes  the  notion  of  a "subdatabase’’  and  the  induced 
generalization  association  construct.  A subdatabase  is  a 
structure  that  holds  the  result  of  a query  and  thus 
represents  the  second  property  of  our  query  model.  The 
third  property  of  our  query  model  is  described  in  Section 
3.3.3,  where  we  abstractly  describe  the  kinds  of 
operators  that  are  needed  in  a query  language  that 
belongs  to  this  query  model. 

3.3.1  The  Object-Oriented  View  of  Databases 

The  00  view  of  an  application  world  is  represented 
in  the  form  of  a network  of  object  classes  and 
associations  (links)  between  these  classes.  In  order  to 
give  a better  description  of  the  00  representation,  we 
shall  briefly  describe  a university  application  domain 
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using  the  00  semantic  association  model  OSAM*  [SU88]  as 
an  example.  The  university  schema  as  shown  in  Figure  3.1 
is  then  used  in  the  remainder  of  this  dissertation  as  the 
application  domain  for  which  example  queries,  deductive 
rules,  and  constraints  are  defined.  Though  the  OSAM*  data 
model  is  used  here,  the  concepts  and  techniques 
introduced  in  this  dissertation  can  be  applied  to  any 
other  00  data  model  and  are  not  limited  to  OSAM*. 

In  OSAM*  object  classes  are  graphically  represented 
as  nodes  and  associations  among  object  classes  are 
represented  as  links.  The  resulting  diagram  is  called  the 
Semantic  Diagram  or  S-diagram.  In  OSAM*,  there  are  two 
types  of  object  classes:  Entity  object  classes  (E-class) 
and  Domain  object  classes  (D-class)  which  are  represented 
in  the  S-diagram  as  rectangular  and  circular  nodes, 
respectively.  The  sole  function  of  a D-class  is  to  form  a 
domain  of  possible  values  (e.g.,  integers,  strings,  etc.) 
from  which  descriptive  attributes  of  objects  draw  their 
values.  An  E-class,  on  the  other  hand,  forms  a domain  of 
objects  which  occur  in  an  application's  world  (e.g., 
Faculty,  Department,  etc.).  Each  object  of  an  E-class  is 
represented  by  a unique  object  identifier  (OID). 

There  are  five  types  of  links  (associations)  in 
OSAM*.  Two  of  these  association  types  appear  in  Figure 
3.1,  namely,  Aggregation  (A)  and  Generalization  (G), 
which  are  also  recognized  in  several  other  semantic  and 
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00  data  models.  A class  can  have  several  types  of  links 
and  more  than  one  link  of  each  type  emanating  from  it.  In 
the  S-diagram,  links  of  the  same  type  that  emanate  from  a 
class  are  grouped  together  and  labeled  by  the  letter  that 
denotes  the  association  type. 

As  an  example,  in  Figure  3.1,  the  E-class  Person  has 
two  types  of  links:  Aggregation  links  connecting  Person 
to  D-classes  SS#  and  Name  and  Generalization  links  to  the 
E-classes  Student  and  Teacher  (i.e.,  Student  and  Teacher 
are  subclasses  of  the  superclass  Person).  An  aggregation 
link  represents  an  attribute  and  has  the  same  name  as  the 
class  it  connects  to  unless  specified  otherwise  (e.g., 
the  link  labeled  Major  which  emanates  from  the  class 
Student  has  a different  name  from  the  class  it  points 
to)  . Aggregation  links  that  emanate  from  an  E-class  and 
connect  to  D-classes  represent  the  descriptive  attributes 
of  that  class  (e.g.,  the  attribute  section#  of  the  class 
Section).  A class  inherits  all  the  aggregation 
associations  that  connect  to  or  emanate  from  its 
superclasses.  Figure  3.2  shows  the  actual  view  of  the 
class  RA  in  which  all  the  associations  inherited  by  RA 
are  explicitly  represented.  A detailed  description  of  the 
OSAM*  model  can  be  found  in  [SU88] . 

Unlike  Semantic  Networks  in  the  area  of  artificial 
intelligence,  the  actual  instances  of  the  object  classes 
of  Figure  3.1  (e.g.,  the  individual  departments,  which 
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are  the  instances  of  the  class  Department,  etc.)  are  not 
represented  in  the  S-diagram.  This  is  normally  cited  as 
the  most  fundamental  difference  between  semantic  networks 
in  the  area  of  artificial  intelligence,  and  semantic  data 
models  in  the  area  of  database  management  systems 
[ HUL87 ] . This  difference  arises  because  database  systems 
normally  deal  with  data-rich  applications  and  it  is  more 
convenient  to  abstract  the  structure  of  data  from  the 
data  itself. 

The  following  are  the  essential  differences  between 
the  00  and  relational  views  of  a database: 

(1)  In  the  relational  view  of  data  [COD70,  DAT  8 1 , 
ELM89] , a tuple  in  a table  represents  the  relation 
that  exists  among  objects  drawn  from  some  primitive 
domains  (Integer,  Real,  String,  etc.).  For  example, 
in  the  following  relational  schema: 

Employee  (E#,  Ename,  Salary,  Dept#) 

Emp-phones  (E#,  ph# , time) 

Department  ( Dept# , Dname,  Location) 

An  Employee  tuple  represents  a relation  among  the 
following  objects  (attribute  values)  in  that  order: 
Integer  number,  Real  number,  String,  and  another 
Integer  number  that  are  drawn  from  the  domains  E# , 
Ename,  Salary,  and  Dept#,  respectively.  Each  of  the 
objects  in  these  domains  has  its  independent 
existence.  The  case  is  similar  for  Department  and 
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Employee-phones  (Emp-phones)  tuples.  In  an  00  data 
model,  on  the  other  hand,  every  object  of  the 
application  world  is  represented  by  a unique 
identifier  (OID)  that  is  independent  of  the  set  of 
descriptive  attribute  values  (or  instance  variable 
values)  that  characterize  or  describe  it. 

(2)  In  the  relational  data  model,  the  descriptive  data 
about  an  object  from  the  application  world  may 
become  logically  scattered  among  several  relations 
due  to  normalization.  For  example,  the  descriptive 
data  about  an  employee  in  the  above  schema  is 
scattered  among  two  relations,  namely,  Employee  and 
Emp-phones.  (This  decomposition  is  necessary  in  the 
relational  data  model  if  each  employee  can  be 
reached  at  different  phones  depending  on  the  time  of 
the  day.)  The  mapping  between  these  tuples  from 
different  relations  and  the  corresponding  real  world 
object  has  to  be  consciously  maintained  by  the  user. 
In  an  00  data  model,  on  the  other  hand,  descriptive 
data  about  an  object  are  not  logically  scattered  but 
localized . 

(3)  In  the  relational  data  model,  an  association  between 
two  tuples  is  indirectly  (implicitly)  represented. 
For  example,  in  the  above  schema,  to  associate  an 
Employee  tuple  with  a Department  tuple,  the 
association  is  established  via  a third  party,  viz. 
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via  the  objects  of  the  Dept#  domain.  In  other  words, 
an  Employee  tuple  must  reference  a Dept#  object 
(i.e.,  an  integer  from  the  Dept#  class  of  objects) 
that  has  been  referenced  by  a Department  tuple.  The 
referential  integrity  constraint  [DAT81] 
effectively,  limits  the  Dept#  objects  that  can  be 
referenced  by  tuples  in  the  Employee  table  to  only 
those  that  have  been  referenced  by  tuples  in  the 
Department  table.  Changes  made  to  the  key  attribute 
values  may  modify  the  associations  already  specified 
in  the  database.  In  the  00  view  of  data,  an 
association  between  two  objects  is  specified  by 
explicitly  (directly)  relating  one  OID  to  another.  A 
user  may  also  assign  an  alias  to  an  OID,  which 
he/she  can  later  use  to  refer  to  the  object.  Changes 
made  to  key  descriptive  attribute  values  will  not 
affect  the  associations  among  objects. 

(4)  Consequently,  in  a relational  database,  to  retrieve 
the  Employee  and  Department  tuples  that  are  related, 
a Join  between  the  two  relations  over  Dept#  is 
necessary.  In  an  00  database,  on  the  other  hand,  the 
Employee  and  Department  objects  are  already  stored 
in  a "Joined"  form  since  the  objects  of  one  class 
may  use  the  OIDs  of  the  objects  of  the  other  class 
to  explicitly  specify  the  object  associations. 
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(5)  The  concept  of  Generalization  is  not  incorporated  in 

the  relational  data  model.  In  an  00  data  model, 

classes  of  objects  that  share  common  structural  and 

operational  characteristics  are  grouped  into 

generalization  hierarchies  where  properties, 

operations,  and  associations  are  inherited  downward. 

3.3.2  Subdatabases  and  the  Induced  Generalization 
Association  

The  result  of  a query  in  our  query  model  is  a 

subdatabase.  A subdatabase  is  a portion  of  the  original 

database  and  consists  of  two  parts:  an  intensional 

association  pattern  and  a set  of  extensional  association 

patterns  . The  intensional  association  pattern  of  a 

subdatabase  is  represented  as  a network  of  object  classes 

and  their  associations.  For  example,  Figure  3.3  shows  a 

certain  subdatabase  SDB  of  the  original  database  of 

Figure  3.1.  Figure  3.3a  represents  the  intensional 

association  pattern  of  this  subdatabase,  which  consists 

of  the  classes  Teacher,  Section,  and  Course  and  their 

associations . 

An  extensional  association  pattern  is  a network  of 
instances  and  their  associations  that  belong  to  the 
classes  and  association  types  of  the  intensional 
association  pattern.  The  set  of  extensional  patterns  of  a 
subdatabase  can  be  represented  in  the  form  of  an 
extensional — diagram . Figure  3.3b  shows  a possible 
extensional  diagram  for  the  subdatabase  SDB,  where  the 
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t's,  s's,  and  c's  are  the  OIDs  for  objects  from  the 
classes  Teacher,  Section,  and  Course,  respectively.  The 
interconnection  of  t3  and  s4  in  the  figure  is  an  example 
of  an  extensional  pattern,  which  records  the  fact  that 
object  1 3 is  associated  with  object  s4  (Teacher  t3  is 
teaching  Section  s4 ) . 

A normalized  extensional  diagram  is  an  extensional 
diagram  in  which  an  object  may  appear  more  than  once 
depending  on  the  number  of  associations  it  has  with  the 
objects  of  a neighboring  class  and  a separate  link  is 
used  to  connect  it  to  each  object  of  the  neighboring 
class.  Figure  3.3c  shows  the  normalized  extensional 
diagram  for  the  same  subdatabase.  In  the  remainder  of 
this  dissertation,  we  shall  deal  with  normalized 
extensional  diagrams  only.  In  addition  to  the  graphical 
representation,  an  extensional  pattern  may  be  represented 
as  a tuple  of  OIDs.  For  example,  <tl,s2,cl>,  <c3>  and 
<s5,c4>  are  some  of  the  extensional  patterns  that  appear 
in  the  subdatabase  SDB  (Figure  3.3c). 

In  our  query  model,  a query  operates  on  one  or  more 
subdatabases,  which  we  call  the  source  subdatabases  and 
returns  a new  subdatabase,  which  we  refer  to  as  the 
target  subdatabase.  (The  original  database  is  manipulated 
just  as  any  of  the  subdatabases,  i.e.,  it  can  be  one  of 
the  source  subdatabases  of  a query.  ) The  intensional 
pattern  of  a target  subdatabase  is  derived  from  the 
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intensional  patterns  of  the  source  subdatabases.  Also, 
the  extensional  patterns  of  a target  subdatabase  are 
derived  from  the  extensional  patterns  of  the  source 
subdatabases  based  on  the  conditions  specified  in  the 
query . 

We  refer  to  a class  in  a target  subdatabase  as  a 
target  class . The  class  in  a source  subdatabase  from 
which  a target  class  is  derived  is  called  the  source 
class . The  instances  of  a target  class  is  a subset  of  the 
instances  of  the  source  class  from  which  it  is  derived. 

Between  every  target  class  and  its  source  class, 
there  is  an  induced  generalization  association,  which 
emanates  from  the  source  class  and  connects  to  the  target 
class.  Therefore,  a target  class  inherits  all  the 
aggregation  associations  of  its  source  class,  which 
establishes  the  i n t er- subd a t a ba se  connections  as 
described  in  the  following  example. 

Let  SD  be  a subdatabase  whose  intensional  pattern  is 
as  shown  in  Figure  3.4.  Let  SD1  and  S D 2 be  two 
subdatabases  that  are  derived  from  SD  by  two  queries 
(Figure  3.4).  Class  A that  appears  in  SD1 , which  we  refer 
to  as  SD1:A,  is  derived  from  the  corresponding  class  in 
SD,  i.e.,  SD:A  (SD:A  and  SD1 : A are  the  source  and  target 
classes,  respectively).  Therefore,  there  is  an  induced 
generalization  association  that  emanates  from  SD:A  and 
connects  to  SD1:A,  as  shown  in  the  figure.  Similarly, 
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there  is  an  induced  generalization  association  emanating 
from  SD : D and  connecting  to  SD2-.D  and  two  induced 
generalization  associations  emanating  from  SD:C  and 
connecting  to  the  classes  SD1:C  and  SD2:C. 

Each  of  the  classes  of  SD1  and  SD2  inherits  all  the 
aggregation  associations  of  its  superclass  in  SD.  For 
example,  SD1:A  inherits  the  links  that  connect  to  and  the 
links  that  emanate  from  its  superclass  SD: A,  which 
effectively  means  that  there  are  aggregation  associations 
between  SD1:A  and  each  of  the  classes  SD:B  and  SD:C.  The 
inherited  association  that  connects  SD1:A  to  SD:C  is 
further  inherited  by  SD1:C,  which  is  a subclass  of  SD:C. 
Since  this  association  that  connects  SD1:A  to  SD1 : C is 
local  (internal)  to  the  subdatabase  SD1,  it  is  explicitly 
represented  in  its  intensional  pattern.  Similarly,  the 
inherited  association  that  emanates  from  SD1:A  and 
connects  to  SD : C is  further  inherited  by  SD2:C.  This 
means  that  there  is  an  (inherited)  association  between 
the  two  classes  SD1:A  and  SD2:C,  which  belong  to  two 
different  subdatabases.  Thus,  the  inheritance  property  of 
the  induced  generalization  association  establishes  the 
connections  between  classes  from  different  subdatabases. 
Figure  3.5  shows  an  equivalent  representation  to  that  of 
Figure  3.4,  where  the  aggregation  associations  that  the 
class  SD1:A  inherits  from  its  superclass  SD:A  are 
represented.  (Similarly,  the  classes  of  the  subdatabase 
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SDB  of  Figure  3.3  inherit  all  the  aggregation 
associations  of  their  source  classes  in  the  database  of 
Figure  3.1.  and  the  associations  between  the  classes 
Teacher,  Section,  and  Course  are  explicitly  represented 
because  they  are  local  to  this  subdatabase).  The 
intensional  association  pattern  of  a subdatabase  consists 
of  the  classes  that  appear  in  the  subdatabase  and  the 
associations  that  are  local  to  it. 

We  note  here  that  if  a class  name  is  referenced  in 
any  expression  without  qualifying  it  by  a subdatabase 
name,  the  "base"  class  (i.e.,  the  class  that  appears  in 
the  original  database)  is  assumed. 

Formally,  we  define  a subdatabase  as  follows.  A 
subdatabase  consists  of  an  intensional  pattern  IP  and  a 
set  of  extensional  patterns  EP . We  define  an  intensional 
pattern  as  a graph,  IP  = (V,L),  where  V = {vp}  is  a set 
of  nodes  and  L = Uit}  is  a set  of  links  each  of  which  is 
of  type  T.  In  this  definition,  T = {A,G},  where  A and  G 
stand  for  Aggregation  and  Generalization,  respectively. 
Each  node  in  IP  represents  a class  and  each  link 
represents  an  association  between  two  classes,  i.e.,  a 
link  can  be  identified  by  an  ordered  pair  (vx,vy)  of 
nodes . 

A subdatabase  has  a set  of  extensional  patterns  EP  = 
[epjj.  Each  ep  is  defined  as  a graph,  ep  = {0,E},  where  0 
(Oi)  is  a set  of  objects  and  E = {ejK}  is  a set  of 
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edges.  In  this  definition,  K = {I,N},  where  I and  N stand 
for  Identity  and  Non-identity,  respectively.  An  ej  is 
labeled  by  I or  an  N depending  on  whether  the 
corresponding  link  in  L is  of  type  G or  A,  respectively. 
Each  o-^  in  0 must  be  an  instance  of  some  vy  in  V of  the 
subdatabase.  Each  ej  in  ep  connects  two  objects  and  can 
be  identified  by  an  ordered  pair  (ox,Oy). 

3.3.3  Operators 

We  complete  the  definition  of  our  query  model  by 
describing  the  functionality  of  the  operators  that  are 
needed  in  a query  language  that  belongs  to  this  query 
model.  At  a high-level  of  abstraction,  these  operators 
should  be  able,  when  used  in  queries,  to  perform  the 
following  functions: 

(1)  Operate  on  a set  of  subdatabases  and  produce  a 
single  subdatabase  as  the  result  of  a query. 

(2)  Identify  the  intensional  association  pattern  of  the 
resulting  subdatabase. 

(3)  Identify  the  set  of  extensional  association  patterns 
of  the  resulting  subdatabase. 

(4)  Identify  the  associations  that  a class  in  the 
resulting  subdatabase  may  not  inherit  from  its 
superclass  in  the  source  subdatabase.  In  other 
words,  the  operators  should,  if  needed,  be  able  to 
block  the  inheritance  of  some  associations  by  any  of 
the  target  classes  in  the  resulting  subdatabase. 
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Different  query  languages  based  on  this  query  model 
that  have  different  syntax  and  flavor  can  be  designed. 
The  object-oriented  query  language  OQL  introduced  in 
Chapter  4 is  one  possible  language. 
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Figure  3.1:  University  Schema 
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Figure  3.2: 
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3.3a:  The  Intensional  Pattern  of  a Subdatabase  SDB 
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Figure  3.3  The  Intensional  and  Set  of  Extensional 
Patterns  of  a Subdatabase 
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Figure  3.4  Subdatabases  SD1 


and  SD2  Derived 
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CHAPTER  4 

OQL — AN  OBJECT-ORIENTED  QUERY  LANGUAGE 
4 . 1 Introduction 

In  this  chapter,  we  make  our  query  model  concrete  by 
introducing  the  object-oriented  query  language  OQL  as  an 
example  of  this  query  model.  As  described  in  Chapter  3,  a 
major  drawback  of  existing  00  query  languages  is  that 
they  do  not  maintain  the  closure  property.  In  contrast, 
an  OQL  query,  can  be  considered  as  a function  that,  when 
applied  to  a database,  returns  a subdatabase  whose 
structure  is  comprised  of  some  selected  classes  of 
objects  and  their  associations,  i.e.,  it  has  the  same 
structural  characteristics  as  those  of  the  original 
database.  Therefore,  the  closure  property  is  maintained 
in  OQL. 

Another  shortcoming  of  existing  00  query  languages 
is  that  they  are  oriented  towards  retrieval  and  storage 
manipulation  operations  (i.e.,  system-defined  operations) 
in  the  sense  that  a query  specifies  data  to  be 
manipulated  by  such  operations  as  Retrieve,  Update,  and 
Delete.  In  a behaviorally  00  data  model,  user-defined 
operations  that  describe  the  behavior  of  object  classes 
can  be  defined  and  registered  with  these  classes.  In  OQL, 
a subdatabase  represents  a "context"  in  which  objects  of 
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various  classes  must  exist  before  being  considered  for 
further  manipulation  by  different  operations  including 
system-defined  operations  (e.g.,  Update,  Display)  as  well 
as  user-defined  operations  (e.g.,  Hire-employee,  Move). 
Multiple  operations  (messages)  can  be  issued  against  the 
different  classes  of  a subdatabase  in  the  same  query. 

This  Chapter  introduces  the  OQL  language  and  is 
organized  as  follows.  In  Section  4.2,  the  overall 
structure  of  an  OQL  query  is  defined  and  the  basic 
operators  used  for  defining  subdatabases  are  introduced. 
Section  4.3  presents  some  advanced  features  of  OQL.  The 
distinguishing  advantages  of  OQL  are  summarized  in 
Section  4.5. 

4.2  Basic  Features  of  OQL 

In  this  section,  the  general  OQL  syntax  and 
semantics  are  described  and  illustrative  examples  are 
given.  In  what  follows,  capital  letters  are  used  to 
denote  E-classes  (A,  B,  . . . ) and  small  letters  with  an 
integer  appended  to  each  letter  are  used  to  denote 
objects  (or  OIDs)  (e.g. , al,  a2,  . . . and  bl,  b2 , . . . are 
OIDs  for  objects  that  belong  to  the  classes  A and  B, 
respectively) . 

4.2.1  Definitions  and  Overview 

We  define  an  extensional  pattern  type  as  the  common 
template  that  is  shared  by  several  extensional 
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association  patterns  in  a subdatabase.  A pattern  type  is 
denoted  by  a tuple  of  class  names.  For  example,  <Teacher, 
Section,  Course>  is  one  of  the  extensional  pattern  types 
that  exist  in  Figure  3.3c,  which  has  as  instances  all  the 
extensional  association  patterns  that  contain  Teacher, 
Section,  and  Course  objects,  i.e.,  the  extensional 
patterns  <tl,s2,cl>,  <t2,s3,cl>,  and  <t2,s3,c2>.  On  the 
other  hand,  the  extensional  pattern  <t3,s4>  whose  Course- 
component  is  Null  (since  the  pattern  does  not  contain  any 
Course  object)  is  of  the  type  CTeacher,  Section>  . The 
five  extensional  pattern  types  present  in  the  extensional 
diagram  of  Figure  3.3c  are  <Teacher,  Section,  Course> , 

< Teacher , Section) , <Section,  Course),  <Teacher>,  and 
<Course> . 

The  philosophy  underlying  OQL  is  to  allow  the  user 
to  specify,  first,  the  desired  subdatabase  by  specifying 
its  intensional  pattern  and  the  set  of  extensional 
pattern  types  that  are  of  interest  and  then  the 
operation  ( s ) to  be  performed  on  the  classes  of  the 
subdatabase.  The  search  engine  of  the  underlying  00  DBMS 
would  establish  the  subdatabase  by  identifying  all  the 
extensional  patterns  that  belong  to  the  specified  types 
and  then  perform  the  operation ( s ) . 

A query  block  in  OQL  consists  of  a CONTEXT  clause 
and  an  OPERATION  clause.  The  CONTEXT  clause  has  two 
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optional  subclauses:  a WHERE  subclause  -and  a SELECT 

subclause.  This  structure  is  shown  below. 

CONTEXT  association  pattern  expression 
WHERE  conditions 

SELECT  object  classes  and/or  attributes 
OPERATION ( s ) object  class(es) 

In  the  CONTEXT  clause,  the  user  specifies  a desired 
subdatabase  by  specifying  its  intensional  pattern  and 
extensional  pattern  types  of  interest  (both  are  specified 
in  the  association  pattern  expression).  A linear 
association  pattern  expression  has  the  form  "A  [intra- 
class conditions]  op  B [intra-class  conditions]  op  C 
[intra-class  conditions]  ..."  where  "op"  is  one  of  the 
association  pattern  operators  to  be  described  in  Sections 
4.2.2  and  4.2.3.  Each  operator  separates  two  E-classes 
that  are  directly  associated  in  a schema.  More  complex 
association  pattern  expressions  that  contain  branching 
are  described  in  Section  4.3.  The  intra-class  conditions 
enclosed  in  brackets  following  a class  name  are  optional 
and  are  expressed  in  the  form  of  predicates  that  involve 
the  descriptive  attributes  of  that  class. 

The  WHERE  subclause  further  causes  the  extensional 
patterns  that  do  not  satisfy  some  conditions  to  be 
dropped  from  the  Context  subdatabase.  The  conditions  that 
can  be  specified  in  the  WHERE  subclause  are  inter-class 
comparison  conditions,  which  are  comparisons  between  some 
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attributes  of  two  classes  if  these  attributes  are  type 
comparable,  and/or  comparisons  between  objects  (equal 
'='  or  not  equal  '!=')  if  these  objects  are  type 
comparable  (i.e.,  belonging  to  the  same  E-class  or  any  of 
its  superclasses  or  subclasses). 

The  SELECT  subclause  operates  on  the  subdatabase 
returned  by  the  CONTEXT  clause  and  its  optional  WHERE 
subclause  to  produce  a new  subdatabase  which  results  from 
projecting"  the  Context  subdatabase  over  some  classes 
and/or  descriptive  attributes.  A resulting  subdatabase 
can  be  operated  on  by  the  operation(s)  specified  in  an 
OPERATION  clause.  We  note  that  the  Select  operation  in 

OQL  does  not  imply  displaying  some  data  to  the  user  as  in 
SQL. 

The  OPERATION  clause  specifies  a set  of  messages 
(operation  names)  to  be  sent  to  the  classes  of  the 
subdatabase  that  is  returned  by  the  Context  expression. 
Each  message  may  be  followed  by  one  or  more  arguments 
that  identify  the  recipient  classes.  Thus,  several 
operations  can  be  performed  over  the  same  or  different 
classes  and  a single  operation  can  be  performed  over 
several  classes  resulting  from  a CONTEXT  clause.  An 
operation  can  be  either  a system-defined  data 
manipulation  operation  (e.g.,  Display,  Update,  Print)  or 
a user-defined  operation  (e.g.,  Rotate,  Order-part,  Hire- 
employee  ) . 
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If  the  Display  operation  is  specified  in  the 
OPERATION  clause,  it  causes  the  values  of  the  descriptive 
attributes  that  appear  in  the  subdatabase  to  be  displayed 
to  the  user  (i.e.,  on  the  Screen)  in  the  form  of  a table. 
If  the  Display  operation  is  not  followed  by  a class  name 
as  an  argument,  the  resulting  table  will  be  a first 
normal  form  table  defined  over  all  identified  attributes. 
Otherwise,  the  argument  class  identifies  a "viewpoint" 
based  on  which  the  resulting  descriptive  data  is  to  be 
organized  in  the  form  of  a non-normalized  table  (i.e.,  a 
table  in  which  a value  can  be  a nested  table  or  a set). 
Thus,  the  descriptive  data  are  structured  under  the 
objects  of  the  argument  class.  The  Print  operation 
behaves  in  a way  similar  to  the  Display  operation  except 
that  the  resulting  table  is  sent  to  the  printer. 

The  operators  that  can  be  used  in  the  association 
pattern  expression  of  the  CONTEXT  clause  are  the 
association  operator  and  the  nonassociation  operator, 
which  are  described  in  the  following  two  sections. 

4.2.2  The  Association  Operator 

When  the  association  operator  (*)  is  applied  to  two 
directly  associated  E-classes  A and  B in  a database 
(i.e.,  the  expression  "A  * B" ) , it  returns  a subdatabase 
whose  intensional  pattern  consists  of  the  two  classes  A 
and  B and  their  association.  The  resulting  subdatabase 
contains  also  the  set  of  extensional  patterns  drawn  from 
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the  operand  database  such  that  each  extensional  pattern 
contains  objects  of  both  A and  B (i.e,  extensional 
patterns  that  are  of  the  type  <A,B>).  B objects  that  are 
not  associated  with  any  A object  and  A objects  that  are 
not  associated  with  any  B object  in  the  operand  database 
are  not  retained  in  the  resulting  subdatabase.  In  other 
words,  the  resulting  subdatabase  includes  the  set  of 
extensional  patterns  EP  such  that  each  extensional 
pattern  ep  in  EP  contains  non-null  values  of  A and  B. 

In  order  to  formally  define  the  association 
operator,  we  denote  by  EP  (APE)  the  set  of  extensional 
patterns  (EP)  that  are  returned  by  an  association  pattern 
expression  (APE).  Also,  ep(CL)  denotes  the  object  from 
the  class  CL  that  belongs  to  the  extensional  pattern  ep 
provided  that  CL  belongs  to  the  corresponding  intensional 
pattern.  For  example,  in  Figure  3.3,  if  ep  = <t3,s4>, 
then  ep ( Teacher ) = t3,  ep(Section)  = s4,  and  ep(Course)  = 
Null.  The  association  operator  is  formally  defined  as 
follows : 

EP  (A*B)  = {ep  : ep ( A ) !=  NULL  and  ep(B)  !=  NULL) 

The  following  example  queries  illustrate  the  use  of  the 
association  operator. 

Query  1 Display  the  names  of  the  teachers  who  teach  some 
sections  and  the  section# 's  for  these  sections. 

CONTEXT  Teacher  * Section 

iJk'C'i*  cl*4' 

SELECT  name , section# 


68 


DISPLAY  Teacher 

If  the  Context  expression  in  this  query  is  applied 
to  the  subdatabase  SDB  of  Figure  3.3,  it  returns  a new 
subdatabase  whose  set  of  extensional  patterns  is 
{ < tl , s2> , <t2 , s3> , < t3 , s4 > } . The  extensional  pattern  <t4> 
(or  <t4,  Null > ) is  not  included  in  this  set  because  its 
Section-component  is  Null  (similarly  the  pattern  <s5>  is 
not  included).  Figure  4.1a  shows  the  intensional  pattern 
of  this  subdatabase  where  all  the  descriptive  attributes 
inherited  by  the  classes  Section  and  Teacher  appear  with 
them  for  the  convenience  of  the  reader.  The  SELECT 
subclause  derives  a new  subdatabase  from  the  Context 
subdatabase.  The  intensional  pattern  of  the  new 
subdatabase,  as  shown  in  Figure  4.1b,  consists  of  the 
classes  Teacher  and  Section  and  only  the  two  attributes 
that  are  referenced  in  the  SELECT  subclause,  namely,  Name 
of  Teacher  and  Section#  of  Section  (i.e.,  the  other 
attributes  are  not  to  be  inherited  by  the  classes  that 
appear  in  this  subdatabase).  If  all  the  descriptive 
attributes  of  a class  are  to  be  retained  in  (i.e., 

inherited)  the  subdatabase,  the  following  rule  can  be 
applied . 

— -le  #1  If  a11  the  descriptive  attributes  of  a class 
are  to  be  retained  in  the  subdatabase  derived 
by  the  SELECT  subclause,  the  class  name  can  be 
referenced  in  the  subclause  without  specifying 
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any  of  its  attributes,  i.e.,  the  default  is 
"all  attributes." 

A SELECT  subclause  can  also  select  a subset  of  E- 
classes , thus  producing  a subdatabase  that  results  from 
dropping  the  unreferenced  E-classes  from  the  operand 
subdatabase.  (A  class  is  considered  "unreferenced"  in  a 
SELECT  subclause  if  none  of  its  descriptive  attributes  is 
referenced  in  it.)  If  a class  to  be  dropped  from  the 
operand  subdatabase  connects  two  classes  that  are  to  be 
retained,  new  direct  association  is  derived  between  these 
two  retained  classes  in  the  produced  subdatabase.  At  the 
extensional  level,  direct  links  between  the  instances  of 
the  two  classes  are  inferred.  For  example,  the 
subdatabase  that  results  from  selecting  the  classes 
Teacher  and  Course  of  the  subdatabase  SDB  of  Figure  3.3 
is  shown  in  Figure  4.2. 

The  display  operation  in  the  above  query  causes  the 
result  to  be  displayed  in  a non-normalized  tabular  form 
in  which  each  tuple  consists  of  a teacher's  name  and  the 
set  of  section# 's  for  the  sections  he/she  teaches.  This 
is  because  the  argument  of  the  Display  operation 
indicates  that  the  result  is  to  be  viewed  from  the  point 
of  view  of  the  class  Teacher.  We  note  that  the  result  of 
a Display  operation  does  not  belong  to  the  world  of 
subdatabases  and  therefore  cannot  be  operated  on  using 
the  OQL  operators.  However,  the  subdatabase  that 
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corresponds  to  a certain  displayed  result  can  be  further 
operated  on  to  produce  a new  subdatabase  whose 
descriptive  attributes  can  be  also  displayed. 

The  definition  of  the  association  operator  can  be 
easily  generalized  to  the  case  when  the  association 
pattern  expression  contains  more  than  two  classes.  For 
example,  the  expression  "A  * B * C"  returns  the 
extensional  patterns  that  are  of  the  type  <A,B,C>.  (it  is 
noted  here  that  one  can  define  a single  extensional 
pattern  type  using  the  association  operator.)  A mechanism 
for  defining  a richer  variety  of  extensional  pattern 

types  in  a single  expression  is  described  in  Section 
4.2.4. 


— -ery  2 Dispiay  the  Department  names  for  all  departments 
that  offer  6000  level  courses  that  have  current  offerings 
(sections).  Also,  display  the  titles  of  these  courses  and 

the  textbooks  used  in  each  section.  In  addition,  print 
the  results. 

CONTEXT  Department  * 

Course  [6000  <=  c#  < 7000]  * Section 
SELECT  name,  title,  textbook 
DISPLAY 
PRINT 


Two  operations  are  specified  in  the  OPERATION  clause 
of  this  query,  namely,  Display  and  Print.  These 
operations  are  to  be  performed  on  the  subdatabase 


71 


returned  by  the  SELECT  subclause.  Also,  the  intra-class 
condition  on  the  C#  attribute  of  Course  is  enclosed  in 
brackets  following  the  class  name  in  the  Context 
expression.  The  result  of  the  Display  or  Print  operation 
is  a normalized  table  since  neither  of  the  two  operations 
is  followed  by  a viewpoint  class. 

Aggregation  and  Generalization  Differences:  As  in 
Query  3 below,  an  association  (or  n on a s s oc i a t i on ) 
operator  can  be  used  between  any  two  classes  whether  they 
are  connected  by  a generalization  or  an  aggregation 
association.  An  association  pattern  expression  concerns 
only  with  whether  some  classes  and  their  instances  are 
associated  with  one  another  or  not.  It  does  not  specify 
what  types  of  associations  relate  them.  This  is  because 
the  types  of  the  associations  connecting  them  have 
already  been  explicitly  defined  in  the  schema  and 
restating  these  association  types  in  queries  is 
unnecessary.  The  query  processor  of  an  00  DBMS  can  make 
use  of  the  type  information  stored  in  the  dictionary  to 
properly  interpret  the  queries  and  enforce  the  relevant 
semantics  and  constraints.  For  example,  a link  that 
exists  between  an  instance  of  the  class  TA  and  an 
instance  of  the  class  Grad  is  an  identity  link,  in  other 
words,  the  semantics  implied  by  the  generalization 
association  here  is  that  the  two  instances  are  actually 
two  different  perspectives  of  the  same  real  world  object, 
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This  is  different  from  the  semantics  implied  by  an 
aggregation  association,  in  which  case  a link  represents 
a relationship  between  two  different  real  world  objects. 

Before  presenting  the  example  query  3,  we  introduce 

the  following  two  general  rules  that  are  relevant  to  the 
query . 

~Ule  An  attribute  that  appears  in  the  SELECT  or 

WHERE  subclause  has  to  be  qualified  by  its 
class  name  only  if  it  is  not  unique  among  the 
attributes  of  the  classes  that  are  referenced 
in  the  CONTEXT  clause. 

— le  #3  Different  aliases  (range  or  iteration 
variables)  of  a class  can  be  generated  in  OQL 
by  appending  an  Underscore  and  an  integer  to 
the  class  name  in  an  association  pattern 
expression  (e.g.,  Grad_l  is  and  alias  of  Grad). 

3 Print  the  names  of  graduate  students  who  teach 
other  graduate  students  in  some  sections.  Also,  print  the 
names  of  those  graduate  students  they  teach.  Organize  the 
result  from  the  point  of  view  of  the  teaching  graduate 
students . 

CONTEXT  Grad_l  * TA  * Teacher  * Section  * 

Student  * Grad_2 

SELECT  Grad_l  [name],  Grad_2  [name] 

PRINT  Grad  1 
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In  this  query,  TA  inherits  the  status  of  being  related  to 
Section  from  both  Teacher  and  Student  with  each  of  them 
having  its  distinctive  meaning.  Thus,  using  the 
expression  "TA  * Teacher  * Section"  instead  of  the 
expression  "TA  * Section"  is  to  explicitly  state  that  we 
are  interested  in  TA  as  playing  the  role  of  Teacher 
rather  than  the  role  of  Student  of  a section.  Also,  in 
this  query,  the  SELECT  subclause  projects  over  the  two 
classes  Grad_l  and  Grad_2  since  they  are  the  only  classes 
referenced  in  it.  Thus,  the  intensional  pattern  of  the 
final  subdatabase  contains  these  two  classes  with  a 
derived  aggregation  association  between  them  and  the 
attribute  Name  of  each  class. 

^ ^ ■ 3 — The  Nonassociation  Operator 

We  use  the  exclamation  sign  (!)  to  denote  this 
operator.  When  this  operator  is  applied  to  two  directly 
associated  E-classes  A and  B in  a schema  (i.e.,  the 
expression  "A  ! B"),  it  returns  a subdatabase  which 
contains  only  the  instances  of  A that  are  not  associated 
with  any  instance  of  B and  the  instances  of  B that  are 
not  associated  with  any  instance  of  A.  For  example,  the 
two  instances  t4  and  s5  are  returned  in  the  subdatabase 
that  results  when  the  expression  "Teacher  ! Section"  is 
applied  to  the  subdatabase  SDB  of  Figure  3.3  (i.e.,  it 
returns  the  teachers  who  are  not  assigned  to  any  sections 
and  the  sections  which  are  not  assigned  to  any  teacher). 
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Formally,  the  nonassociation  operator  is  defined  as 
follows : 

EP ( A ! B ) = {ep  : ep(A)  = NULL  or  ep(B)  = NULL} 

The  association  operator  has  higher  precedence  than 
the  nonassociation  operator.  As  an  example,  when  the 
association  pattern  expression  "Teacher  ! Section  * 
Course"  is  applied  to  the  classes  of  the  subdatabase  SDB 
of  Figure  3.3,  it  produces  a new  subdatabase  that 
contains  the  following  set  of  extensional  patterns: 

{ < s5 , c4 > , < 1 3 > , < t4 > } . The  pattern  <t3>  is  retained  in 

this  result  because  of  the  higher  precedence  of  the 
association  operator  over  the  nonassociation  operator. 
When  the  association  operator  is  applied  first,  object  s4 
is  not  retained  in  the  result  (since  it  is  not  associated 
with  any  Course  object)  causing  object  t3  to  be  not 
associated  with  any  Section  object  in  this  result.  When 
the  nonassociation  operator  is  applied  next,  it  causes 
all  the  Teacher  objects  that  are  associated  with  any 
pattern  of  the  type  < Section , Course>  (i.e.,  the  objects 
tl  and  1 2 ) to  be  dropped  together  with  these 
<Section,Course>  patterns.  Thus,  the  final  result 
contains  the  above  set  of  extensional  patterns.  The 
precedence  of  the  association  operator  over  the 
nonassociation  operator  can  be  overridden  by  parentheses. 
The  following  is  an  example  query  that  uses  the 


nonassociation  operator. 
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^-ery  4 DlSplay  the  names  of  those  graduate  students  who 
are  TA ' s but  not  RA ' s . 

CONTEXT  TA  * Grad  ! RA 
SELECT  TA  [name] 

DISPLAY 


1,.  2.4  Association  Pattern  Subexpressions 

By  using  only  the  association  operator  in  an 
association  pattern  expression,  one  can  identify  a single 
extensional  pattern  type.  In  some  situations,  extensional 
patterns  of  different  types  may  be  desired  in  the 
resulting  subdatabase.  This  can  be  performed  in  OQL  by 
enclosing  a subexpression  of  the  association  pattern 
expression  inside  braces.  This  subexpression  identifies  a 
certain  extensional  pattern  type.  For  example,  the 
expression  "A  * [B  * C]  * D"  returns  the  subdatabase 
whose  intensional  pattern  consists  of  these  four  classes 
and  whose  set  of  extensional  patterns  includes  all 
patterns  that  are  of  the  types  <A,B,C,D>  and  <B,C>.  In 
other  words,  this  expression  means  to  select  both  the 
instances  of  A,B,C  and  D classes  that  are  connected 
(associated)  all  the  way  through  as  well  as  those 
instances  of  B and  C that  are  connected  to  each  other  but 
not  necessarily  connected  to  the  instances  of  A and/or  B. 
The  braces  around  B * C capture  the  semantics  of  the 
Outerjoin  concept  introduced  in  [COD79] . 
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If,  in  the  above  expression,  an  extensional  pattern 
of  the  type  <B,C>  appears  in  the  resulting  subdatabase  as 
part  of  a larger  pattern  of  the  type  <A,B,C,D>,  it  will 
not  appear  independently  in  that  resulting  subdatabase. 
For  example,  if  the  original  database  contains  only  the 
two  patterns  < al , b5 , c5 , d5 > and  <a3,b2,c2>,  then  the 
expression  "A  * {B  * C}  * D"  returns  the  extensional 
patterns  < al , b5 , c5 , d5 > and  <b2,c2>.  The  extensional 
pattern  <b5,c5>  will  not  appear  independently  in  the 
result  since  it  already  appears  as  a part  of  the 
extensional  pattern  < a 1 , b 5 , c 5 , d 5 > . in  general,  an 
extensional  pattern  of  a certain  specified  type  will  not 
appear  independently  in  the  result  if  it  is  part  of  a 
larger  extensional  pattern. 

Subexpressions  can  be  nested  to  several  levels.  For 
example,  the  expression  "{{{A}  * B}  * C}  * D"  identifies 
the  extensional  pattern  types  <A> , <A,B>,  <A,B,C>,  and 
< A , B , C , D>  . 

2iiery  5 Display  the  ssf's  of  all  graduate  students 
(whether  they  have  advisors  or  not)  and  for  those 
graduate  students  who  have  advisors,  display  their 
advisors'  names. 

CONTEXT  { Grad  } * Advising  * Faculty 
SELECT  Grad  [ss#],  Faculty  [name] 


DISPLAY 
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The  attribute  Name  in  the  SELECT  subclause  of  this  query 
is  qualified  by  its  class  name  because  it  is  not  unique 
among  the  classes  referenced  in  the  CONTEXT  clause  (see 
Rule  #2  above)  and  similarly  the  attribute  ssf. 

Query — 6 Display  the  ssl 's  for  the  students  who  have  taken 
courses  that  belong  to  the  department  of  electrical 
engineering  and  the  c#'s  for  these  courses.  Also  display 
the  names  of  those  of  the  students  who  are  graduate 
students . 

CONTEXT  { Department  [name  = 'EE']  * Course  * 

Transcript  * Student  } * Grad 
SELECT  c#,  Student  [ss#],  Grad  [name] 

DISPLAY 

The  intensional  pattern  of  the  subdatabase  returned  by 
the  Select  clause  is  shown  in  Figure  4.3  (the  inherited 
descriptive  attribute  values  are  also  represented  in  the 
figure),  where  a generalization  association  connects 
Student  to  Grad.  In  this  subdatabase,  only  graduate 
students  shall  have  values  for  the  attribute  Name,  even 
though,  in  the  original  database,  all  students  may  have 
Name  values. 


4 . 3 Advanced  Features  of  OQL 
4.3.1  Branching  Association  Patterns 


An  association  pattern  expression  may  contain 
branches  expressed  by  an  AND  or  an  OR  operator.  There  can 
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be  several  nested  levels  of  branching.  For  example,  the 
expression  "A  * B * AND  (C  * OR  (D  * E , F),  G * H)"  is  a 
branching  association  pattern  expression,  which 
corresponds  to  the  intensional  pattern  shown  in  Figure 
4.4.  A class  at  which  the  branching  occurs  is  called  a 
fork  class  (e.g.,  B and  C in  the  above  expression).  An 
AND  operator  means  that,  in  the  result,  an  instance  from 
the  fork  class  must  be  associated  with  instances  from 
both  of  the  branches,  while  an  OR  operator  means  that  an 
instance  from  the  fork  class  must  be  associated  with  an 
instance  from  at  least  one  of  the  two  branches.  Figure 
4.5  shows  some  association  pattern  expressions  that 
represent  networks  of  classes  and  associations  together 
with  a graphical  representation  of  the  extensional 
pattern  types  they  define. 

Query  7 Print  the  name  of  any  faculty  member  who  is 
teaching  any  section  of  a course  that  is  offered  by  the 
'EE'  department,  provided  that  the  section  is  taken  by  at 
least  one  graduate  student  who  is  an  RA.  Also,  print  the 
c#  value( s ) . 

CONTEXT  Faculty  * Section  * 

AND  (Course  * Department  [name  = ' EE ’ ] , RA) 
SELECT  Faculty  [name] , c# 

PRINT 

OQL  makes  full  use  of  the  inheritance  property  of  the 
generalization  association.  In  this  query.  Faculty  and  RA 


79 


inherit  the  association  to  Section  from  their 
superclasses.  Hence,  using  the  expression  "Faculty  * 
Section"  instead  of  "Faculty  * Teacher  * Section"  and  the 
expression  "Section  * RA"  instead  of  "Section  * Student  * 
Grad  * RA"  is  legal. 

4.3.2  Set  Operators 

The  set  operators  Union,  Intersection,  and 
Difference  can  be  applied  to  any  two  union-compatible 
subdatabases  to  produce  a new  subdatabase.  Two 
subdatabases  are  said  to  be  union-compatible  if  both  of 
them  have  the  same  intensional  association  pattern.  Thus, 
the  following  is  a legal  format  for  a query. 

CONTEXT  A * {B  * C * D} 

UNION 

{A  * B * C}  * D 

SELECT  <classes  and  attributes> 

The  first  argument  to  the  Union  operator  returns  a 
subdatabase  whose  extensional  patterns  are  of  the  types 
<A,B,C,D>  and  <B,C,D>  while  the  second  argument  returns  a 
subdatabase  whose  extensional  patterns  are  of  the  types 
<A,B,C,D>  and  <A,B,C>.  However,  both  subdatabases  have 
the  same  intensional  pattern  that  consists  of  the  four 
classes  A,  B,  C,  and  D and  their  associations.  The  result 
of  the  Union  operation  is  a subdatabase  that  contains 
extensional  patterns  of  the  three  types  <A,B,C,D>, 
<B,C,D>,  and  <A,B,C>,  i.e.,  it  contains  the  set  of  all 
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extensional  patterns  that  belong  to  either  the  first  or 
second  subdatabase  or  both  subdatabases.  The  SELECT 
subclause  is  then  applied  to  the  resulting  subdatabase. 
The  Difference  operator  returns  a subdatabase  that 
contains  the  set  of  all  extensional  patterns  that  belong 
to  the  first  subdatabase  but  not  to  the  second 
subdatabase.  The  subdatabase  returned  by  the  Intersection 
operator  contains  the  set  of  extensional  patterns  that 
belong  to  both  subdatabases. 

Set  operators  can  be  used  to  create  subdatabases  in 
which  some  descriptive  attribute  values  of  objects  do  not 
appear  with  them  in  a resulting  subdatabase  (i.e.,  are 
not  inherited  by  the  classes  that  appear  in  the 
subdatabase  from  their  source  classes)  unless  these 
objects  participate  in  certain  patterns  of  associations. 
This  is  illustrated  by  the  following  query. 

CONTEXT  Teacher  * Section  * Course  [c#  >=  5000] 

SELECT  name,  degree,  section# 

UNION 

CONTEXT  Teacher  * Section  * Student  * Grad 
SELECT  Teacher  [name,  ss«],  section# 

The  two  arguments  of  the  Union  operator  are  two  OQL 
queries  that  return  two  union-compatible  subdatabases 
(since  the  SELECT  subclause  in  each  query  projects  over 
the  two  classes  Teacher  and  Section,  i.e.,  both 
subdatabases  have  the  same  intensional  pattern).  This 
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means  that  each  of  the  two  queries  returns  extensional 
patterns  of  the  type  <Teacher,  Section>  but  each  query 
derives  these  patterns  based  on  different  conditions.  The 
class  Teacher  in  the  first  subdatabase  has  the 
descriptive  attributes  Name  and  Degree,  while  in  the 
second  subdatabase  it  has  the  descriptive  attributes  Name 
and  ssf  . The  intensional  patterns  of  these  two 
subdatabases  are  shown  in  Figure  4.6a.  In  the  final 
subdatabase,  i.e.,  the  result  of  the  Union  operation, 
only  the  Teacher  instances  that  appeared  in  both  operand 
subdatabases  will  have  values  for  all  the  three 
attributes  Name,  Degree,  and  ss|.  Those  Teacher  instances 
that  appeared  in  the  first  subdatabase  but  not  in  the 
second  subdatabase  shall  have  Null  values  for  the  ss# 
attribute  in  the  final  subdatabase.  The  same  goes  for 
those  Teacher  instances  that  appeared  in  the  second 
subdatabase  but  not  in  the  first  subdatabase  with  respect 
to  Degree  values.  Figure  4.6b  shows  the  intensional 
pattern  of  the  final  subdatabase  in  which  the  generic 
class  Teacher  has  the  attribute  Name  and  each  of  the  two 
subclasses  Teacher_l  and  Teacher_2  has  one  of  the  other 
two  attributes.  The  instances  of  the  Teacher_l  and 
Teacher_2  subclasses  are  those  derived  from  the  first  and 
second  operands  of  the  Union  operator,  respectively. 

Note  that  the  nonassociation  operator  of  OQL  can  be 
defined  in  terms  of  the  Association  operator,  Difference 
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operator  (which  is  introduced  in  this  section),  and  using 
braces  as  follows: 

A ! B = {A}  * { B } - A * B 

Where  stands  for  the  set  difference  operator 

described  above.  The  following  equation  shows  the 
equivalent  expression  to  an  expression  that  contains  both 
an  association  operator  and  a nonassociation  operator. 

A ! B * C = {A}  *{B*C}~A*B*C 
Though  the  nonassociation  operator  can  be  defined  in 
terms  of  other  OQL  operators,  it  is  provided  as  a shorter 
notation  for  ease  of  use. 

4.3.3  Queries  with  Multiple  Association  Pattern 
Expressions 

In  OQL,  one  can  specify  comparison  conditions 
between  attribute  values  or  between  objects  of  two 
different  classes  that  appear  in  two  different 
association  pattern  expressions.  In  this  case,  the  two 
expressions  shall  be  separated  by  a comma  in  the  CONTEXT 
clause.  Before  presenting  Query  #7  which  contains 
multiple  association  pattern  expressions,  we  give  the 
following  rule  that  is  relevant  to  the  query. 

Rule  #4  If  the  association  operator  is  used  between  two 
classes  that  are  connected  by  more  than  one 
association  (in  this  case  the  association  links 
have  to  be  distinctively  named  in  the  schema), 
then  the  name  of  the  intended  association  needs 
to  be  specified  after  the  association  operator. 
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For  example,  there  are  two  associations  between  the 
classes  Undergrad  and  Department  (Figure  3.1):  the 

association  labeled  Minor  and  the  inherited  association 
labeled  Major.  Hence,  the  expression  "Undergrad  *Major 
Department"  is  used  to  refer  to  undergraduate  students 
and  their  major  departments. 

Query  8 Display  the  names  of  all  the  undergraduate 
students  minoring  in  the  major  department  of  the 
undergraduate  student  whose  ss#  = xxx . Display  also  the 
name  of  the  department. 

CONTEXT  Undergrad_l  *Minor  Departmental , 

Undergrad_2  [ss#  = xxx]  *Ma jor 
Department_2 

WHERE  Department_l  = Department_2 

SELECT  Undergrad_l  [name] , Department_l  [name] 

DISPLAY 

The  two  association  pattern  expressions  of  the 
CONTEXT  clause  in  the  above  query  create  two 
subdatabases.  The  two  subdatabases  are  then  linked  by  the 
condition  stated  in  the  WHERE  subclause  to  produce  a new 
subdatabase  to  which  the  SELECT  subclause  is  applied. 
Also,  in  this  query,  the  comparison  in  the  WHERE 
subclause  is  performed  between  two  type-comparable 
objects  (i.e.,  objects  that  belong  to  the  same  class  or 
to  two  different  classes  of  the  same  generalization 
hierarchy ) . 
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4.3.4  Nested  Association  Pattern  Expressions 

A consequence  of  maintaining  the  closure  property  in 
OQL  is  that  nested  association  pattern  expressions  can  be 
used.  For  example,  the  following  is  an  association 
pattern  expression  that  contains  a another  nested 
expression  (assuming  that  the  class  B is  directly 
associated  with  each  of  the  classes  A,  C,  J,  and  K) . 

CONTEXT  J*  (A  * B * C):B  * K 
Where  "A  * B * C"  is  a nested  association  pattern 
expression  that  identifies  a certain  subdatabase  whose  B 
component  is  referenced  by  the  outer  association  pattern 
expression . 

Another  way  to  express  the  same  functionality  is 
first  to  make  use  of  the  assignment  operator  to  name  and 
save  the  subdatabase  returned  by  the  nested  association 
pattern  expression  as  follows  (where  X is  the  name  of  the 
subdatabase) : 

X :=  CONTEXT  A * B * C 

The  second  step  is  to  use  the  subdatabase  name  to  qualify 
the  class  name  in  another  independent  expression  as 
follows : 

CONTEXT  J * X:B  * K 

4 , 4 Update  Operations 

The  operations  Update,  Delete,  Insert,  Associate, 
and  Dissociate  can  be  specified  in  the  OPERATION  clause 
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of  an  OQL  query.  The  Update  operation  modifies  the  values 
of  the  descriptive  attributes  of  objects  to  some  new 
values  as  specified  in  the  argument  list.  Delete  and 
Insert  are  used  to  delete  some  existing  objects  or  insert 
new  objects  into  the  database.  The  Associate  operation  is 
used  to  create  an  association  (link)  between  a pair  of 
objects  specified  in  the  argument  list.  The  Dissociate 
operator  breaks  the  association  (link)  between  a pair  of 
associated  objects.  A complete  syntax  on  how  these 
operators  can  be  applied  is  given  in  Appendix  A.  In  the 
following,  we  provide  some  example  queries  that  involve 
update  operations.  The  first  example  below  is  a query 
that  updates  the  Title  attribute  for  the  course  whose  C# 
is  6120  to  "file  management." 

CONTEXT  Course  [c#  = 6120] 

UPDATE  Title  "file  management" 

The  following  query  inserts  an  instance  into  the 
class  Student  and  sets  the  ss#  and  Name  attribute  values 
for  the  newly  inserted  instance  to  "555"  and  "Fred," 
respectively . 

CONTEXT  Student 

INSERT  ss#  555,  Name  "Fred" 

The  following  query  deletes  all  the  sections  that 
belong  to  the  course  whose  C#  is  6120  form  the  database. 
CONTEXT  Section  * Course  [c#  = 6120] 


DELETE  Section 
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The  following  query  dissociates  any  associated 
Student  and  Section  instances  provided  that  the  student's 
classification  is  less  than  5 and  the  section  belongs  to 
a graduate  course  (i.e.,  with  C#  >=  5000). 

CONTEXT  Student  [classification  < 5]  * Section  * 

Course  [c#  >=  5000] 

DISSOCIATE  Student,  Section 

4.5  Conclusion 

In  this  Chapter,  we  have  introduced  the  query 
language  OQL  for  manipulating  00  databases.  A query  in 
OQL  returns  a subdatabase,  which  has  structural 
properties  that  are  similar  to  those  of  the  original 
database  (i.e.,  it  contains  multiple  classes  and 
associations).  In  other  words,  the  closure  property  is 
preserved.  Thus,  the  result  of  a query  can  be  an  operand 
of  another  query. 

The  distinguishing  features  of  OQL  are  summarized  as 
follows : 

(1)  A subdatabase  returned  by  a query  represents  a 
"context"  under  which  some  operations  can  be 
specified  and  executed.  In  an  OQL  query,  the 
specification  of  the  Context  subdatabase  is 
separated  from  the  specification  of  the  operations 
on  that  subdatabase.  This  allows  different 
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operations  to  be  performed  on  different  object 
classes  in  the  specified  context. 

(2)  Set  operations  can  be  performed  on  Union-compatible 
subdatabases.  The  result  of  a set  operation  is  a new 
subdatabase  that  can  be  further  manipulated  in  the 
normal  way. 

(3)  The  association  operators  and  the  AND/OR  operators 
allow  very  complex  association  patterns  to  be 
specified  in  a simple  way.  The  same  functionality 
would  be  specified  in  SQL,  for  example,  by  a complex 
nesting  of  Select-From-Where  blocks. 

(4)  Comparison  operators  (i.e.,  ' = ' and  '!  = ')  can  be 

used  to  compare  objects  that  belong  to  E-classes 
directly  without  referencing  their  attributes. 

(5)  OQL  makes  full  use  of  the  inheritance  property  of 
the  generalization  association.  A class  inherits  all 
the  associations  that  emanate  from  or  connect  to  its 
superclasses . 

OQL  is  particularly  suited  for  implementation  on  a 
graphics  system.  A query  can  be  specified  by  browsing  the 
S-diagram  of  object  classes  and  pointing  and  traversing 
object  classes  to  enter  qualification  conditions  and 
association  operators.  An  implementation  of  OQL  on  a SUN 
workstation  is  reported  in  a master's  thesis  [TY88]. 
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4.1a:  The  Result  of  a Context  Expression  of  a Query 


4.1b:  The  Projection  of  the  Subdatabase 
of  Fig. 4. la  over  some  Descriptive  Attributes 


Figure  4.1:  The  Effect  of  a Select  Subclause 
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,2b:  The  Extensional  Diagram  Corresponding  to 
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4.2:  The  Result  of  a Projection  Operation  at 
Intensional  and  Extensional  Levels 
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Figure  4.3:  The  Intensional  Pattern  of  the  subdatabase 
Returned  by  the  Select  Clause  of  a Query 


Figure  4.4:  A Graphical  Representation 

of  a Branching  Association  Pattern 
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A * AND 


(B,  C)  AND  * D 


A * AND  (B,  C)  OR*  D 


Figure  4.5:  Extentional  Pattern  Types  Specified  by  Context 
Expressions  that  Form  Networks 
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UNION 


4.6a  Two  Union-Compatible  Subdatabases 


4.6b:  The  Result  of  Applying  the  UN  1 ON  upe  rat ic 
to  the  two  Subdatabases  Represented  in  Figure  ^ 


Figure  4.6:  An  Example  of  Applying  a Set  Operator  to  tw 
Union  Compatible  Subdatabases 


CHAPTER  5 

DEDUCTIVE  OBJECT-ORIENTED  DATABASES 
5.1  Introduction 

Merging  expert  systems  (ES)  and  database  management 
systems  (DBMS)  technologies  has  drawn  much  interest  in 
recent  years  [GAL84,  ULL85,  ST087,  RAS88,  MAI  88]  . This 
interest  is  motivated  mainly  by  the  need  for  future  ESs 
that  deal  with  large  amounts  of  data  as  well  as  the  need 
for  future  DBMSs  that  have  deduction  capabilities  and, 
therefore,  can  support  many  of  the  new  database 
application  areas  such  as  CAD/CAM,  office  automation,  and 
multi-media  databases.  Several  efforts  have  been  made  to 
design  and  integrate  a deductive  PROLOG-based  rule 
language  with  a relational  DBMS  [JAR84,  CHA84,  VAS84, 
ULL85,  CER86 , ST087,  MAI87,  MAI88,  DEL  8 8 ] . In  this 
approach,  deductive  rules  are  declared  in  the  form  of  a 
logic  program  against  base  relations  in  the  database. 
Each  rule  defines  a virtual  relation  that  is  derived  from 
other  base  and/or  virtual  relations.  (A  survey  of 
existing  work  in  this  area  is  provided  in  Chapter  2.) 

The  integration  of  such  PROLOG-based  languages  with 
relational  databases  is  facilitated  mainly  by  the  fact 
that  a relational  database  is  closed  under  PROLOG-like 
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deductive  rules  in  the  sense  that  the  input  of  a 
deductive  rule  (i.e.,  the  relations  that  the  rule 
operates  on)  can  be  one  or  more  relations  and  its  output 
is  always  a relation.  In  other  words,  the  output  of  a 
rule  still  belongs  to  the  world  of  relations  and 
therefore  can  be  uniformly  operated  on  by  other  deductive 
rules  to  further  derive  new  relations  and  so  on.  It  is 
our  belief  that  a deductive  rule-based  language  that  is 
designed  for  any  data  model  has  to  preserve  the  closure 
property  with  respect  to  that  data  model  meaning  that  the 
derived  data  must  be  structured  and  modeled  using  the 
same  data  model  with  which  the  "base"  data  are  modeled. 

Along  a different  research  line,  the  database 
technology  itself  has  been  actively  moving  towards  the  00 
approach,  which  is  more  appropriate  than  the  relational 
approach  for  supporting  the  new  database  application 
areas  mentioned  above  [HAM81,  BAT85,  BAN87,  FIS87,  HUL87, 
SU88]  . For  a database  system  to  be  more  appropriate  for 
supporting  the  new  database  application  areas  cited 
above,  the  two,  so  far  independent,  research  lines  (i.e., 
deductive  databases  and  00  databases)  need  to  be  merged 
leading  to  deductive  00  databases.  This  can  best  be  done 
by  designing  a deductive  rule-based  language  that 
preserves  the  closure  property  with  respect  to  00  data 
models  (i.e.,  models  that  support  aggregation, 
generalization,  and  the  unique  identification  of 
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objects).  In  this  case,  the  output  of  a rule  must  be 
represented  using  the  constructs  of  the  00  data  model  in 
order  to  make  it  possible  for  such  an  output  to  be 
operated  on  uniformly  by  other  rules  to  further  produce 
other  derived  data  and  thus  form  inference  chains. 

In  this  Chapter,  we  study  the  semantics  of  deductive 
rules  in  00  databases  and  describe  the  component  of  the 
knowledge  definition  language  that  can  be  used  for 
defining  such  rules.  This  language  facilitates  deriving 
new  patterns  of  associations  among  objects  based  on  the 
existing  patterns.  The  IF  clause  of  a deductive  rule  may 
contain  an  OQL  expression  that  identifies  certain 
association  patterns  in  the  database.  The  THEN  clause 
derives  new  association  patterns  among  some  of  the 
objects  that  fall  in  the  patterns  identified  by  the  IF 
clause.  The  set  of  patterns  derived  by  a rule  are  held  in 
a subdatabase,  which  has  structural  characteristics  that 
are  similar  to  those  of  the  original  database,  i.e.,  it 
consists  of  classes  and  their  aggregation  and 
generalization  associations  (Chapter  3).  A derived 
subdatabase  can  be  used  by  other  rules  to  further  derive 
new  subdatabases,  thus  forming  inference  chains. 
Furthermore,  OQL  queries  can  be  used  to  manipulate  the 
data  derived  by  the  rules  in  the  same  way  the  "base"  data 
are  manipulated. 


97 


This  chapter  is  organized  as  follows.  After  this 
introduction,  we  present  the  language  constructs  that  can 
be  used  for  defining  deductive  rules  in  00  databases  and 
describe  the  semantics  of  these  rules.  Section  5.3 
presents  a technique  for  defining  several  deductive  rules 
within  some  common  database  context.  Section  5.4 
describes  how  the  derived  data  can  be  queried  uniformly 
using  the  constructs  of  the  OQL  query  language.  The 
transitive  closure  operation  is  described  in  Section  5.5. 
Some  conclusions  are  given  in  Section  5.6. 

5.2  Deductive  Rules  in  00  Databases 

A deductive-rule  in  our  language  has  an  If-Then 
structure  and  the  If  clause  contains  an  OQL  expression 
that  identifies  the  subdatabase  of  interest.  This 
structure  is  shown  below. 

IF  CONTEXT  association  pattern  expression 
WHERE  conditions 

THEN  (subdatabase-id  (classes))  | (attribute  := 
expression ) 

The  Then  clause  of  a rule  derives  either  a new 
subdatabase  or  the  values  of  a derived  attribute.  The 
following  two  example  rules  derive  attribute  values. 

R 1 If  the  class  Person  in  Figure  3.1  has  the  two 
attributes  Age  whose  values  are  not  explicitly  stored  and 
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Date-of-birth  whose  values  are  stored,  the  following  rule 
derives  the  Age  values  based  on  the  Date-of-birth  values. 

IF  CONTEXT  Person 

THEN  Age  :=  Current-date  - Date-of-birth 
In  this  rule,  we  are  assuming  that  Current-date  is  a 
system  defined  function,  which,  when  called,  returns  the 
current  date  as  its  value. 

R 2 The  classification  of  a student  is  equal  to  the 
quotient  of  dividing  the  number  of  courses  he/she  has 
taken  by  5. 

IF  CONTEXT  Student  * Transcript  * Course 

THEN  classification  :=  COUNT  (Course  BY  Student) 

DIV  5 

In  this  rule,  the  operator  BY  returns,  for  each  student 
instance,  the  set  of  Course  instances  associated  with  it 
(via  some  Transcript  instances).  COUNT  is  a function  that 
returns  the  number  of  courses  in  each  one  of  these  sets. 
DIV  is  an  operator  that  returns  only  the  quotient  of  a 
division  (i.e.,  without  the  remainder). 

A rule  can  also  derive  a subdatabase.  In  this  case, 
the  subdatabase-id  in  the  Then  clause  is  a unique  name  to 
be  given  to  the  derived  subdatabase.  The  intensional 
pattern  of  the  derived  subdatabase  consists  of  the 
classes  referenced  in  the  argument  list  following  the 
subdatabase-id  in  the  Then  clause.  These  classes  should 
be  a subset  of  the  classes  referenced  in  the  association 
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pattern  expression  of  the  If  clause.  Other  unreferenced 
classes  will  not  be  retained  in  the  derived  subdatabase. 
The  extensional  patterns,  on  the  other  hand,  are  derived 
from  the  extensional  patterns  that  satisfy  the  conditions 
of  the  If  clause  and  its  WHERE  subclause.  This  is 
illustrated  by  the  following  deductive  rule. 

R_3  Derive  the  subdatabase  Teacher-course  in  which  only 
the  classes  Teacher  and  Course  appear  such  that  a Teacher 
instance  teaches  an  associated  Course  instance  (i.e., 
teaches  one  of  the  sections  of  the  course). 

IF  CONTEXT  Teacher  * Section  * Course 

THEN  Teacher-course  (Teacher,  Course) 

If  this  rule  is  applied  to  the  Subdatabase  SDB  of 
Figure  3.3,  it  returns  the  subdatabase  Teacher-course 
whose  intensional  pattern  and  set  of  extensional  patterns 
are  shown  in  Figure  5.1.  The  intensional  pattern  of 
Teacher-course  is  composed  of  the  classes  Teacher  and 
Course  only.  The  class  Section  is  not  retained  in  this 
subdatabase  because  it  is  not  referenced  in  the  argument 
list  following  the  subdatabase  name  in  the  Then  clause. 
Since  Teacher  and  Course  in  the  operand  database  are  not 
directly  associated  but  are  associated  through  Section,  a 
new  direct  association  is  derived  between  them  in  the 
resulting  subdatabase  (Figure  5.1a). 

At  the  extensional  level  new  direct  links  are 
inferred  between  the  instances  of  Teacher  and  Course.  In 
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other  words,  the  derived  subdatabase  contains  a set  of 
extensional  patterns  such  that  an  extensional  pattern 
consists  of  a Teacher  instance  and  a Course  instance  with 
a direct  link  between  them,  provided  that  the  two 
instances  satisfy  the  associativity  condition  that  is 
specified  in  the  If  clause,  i.e.,  the  two  instances  must 
coexist  in  the  operand  database  in  an  extensional  pattern 
that  also  contains  a Section  instance.  For  example,  a 
direct  association  is  derived  between  the  instances  tl 
and  cl  as  shown  in  Figure  5.1b  because  tl  and  cl  are 
associated  through  s2  in  the  operand  subdatabase  SDB  of 
Figure  3.3. 

If  a target  class  in  a derived  subdatabase  is  to 
inherit  only  a subset  of  the  descriptive  attributes  of 
its  source  class,  then  these  attributes  should  be  listed 
in  brackets  following  the  class  name  in  the  Then  clause, 
otherwise  all  attributes  are  inherited  (i.e.,  the  default 
is  "all  attributes").  For  example,  if  the  class  Teacher 
in  the  subdatabase  Teacher-course  is  to  inherit  only  the 
attributes  ss#  and  Degree,  the  above  rule  will  be 
expressed  as 

IF  CONTEXT  Teacher  * Section  * Course 

THEN  Teacher-course  (Teacher  [ss#,  Degree],  Course). 
In  this  case,  the  attribute  Name  will  be  inaccessible 
from  the  class  Teacher-course : Teacher . 
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The  following  are  some  additional  rules  defined  for 
the  University  database  of  Figure  3.1.  Each  of  these 
rules  derives  a new  subdatabase  and  illustrates  a certain 
aspect  of  the  semantics  of  deductive  rules  in  00 
databases . 

R4  If  the  total  number  of  students  who  are  enrolled  in  a 
course  that  belongs  to  the  CIS  department  is  greater  than 
39,  then  suggest  offering  the  course  in  the  next 
semester . 

IF  CONTEXT  Department  [name  = 'CIS']  * Course  * 
Section  * Student 

WHERE  COUNT  (Student  BY  Course)  > 39) 

THEN  Suggest-of fer  (Course) 

The  aggregation  function  COUNT  returns  for  each  course 
the  number  of  students  associated  with  it  (via  some 
sections).  The  derived  subdatabase  Suggest-of fer  contains 
only  the  class  Course  together  with  the  Course  instances 
that  satisfy  the  conditions  stated  in  the  If  clause  and 
its  WHERE  subclause  (i.e.,  the  courses  that  belong  to  the 
CIS  department  and  have  more  than  39  enrolled  students). 
R5  If,  for  any  department,  the  number  of  courses  that  are 
suggested  to  be  offered  next  semester  is  greater  than  20 
courses,  then  the  department  needs  more  resources  (i.e., 
budget,  rooms,  labs,  etc.). 

IF  CONTEXT  Department  * Suggest-of fer : Course 
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WHERE  COUNT  (Suggest-offer : Course  BY 
Department)  > 20 
THEN  Deps-need-res  (Department) 

Since  the  class  Course  of  the  subdatabase  Suggest- 
offer  is  a subclass  of  the  base  class  Course  (i.e.,  there 
is  an  induced  generalization  association  between  them)  , 
it  inherits  the  aggregation  link  with  the  base  class 
Department,  hence,  the  expression  "Department  * Suggest- 
offer  : Course"  is  a legal  expression  and  so  as  the 
expression  "Section  * Suggest-offer : Course"  used  in  the 
following  rule. 

R6  If  a graduate  student  is  currently  teaching  a course 
that  is  suggested  to  be  offered,  then  he/she  may  teach 
the  same  course  in  the  next  semester. 

IF  CONTEXT  TA  * Teacher  * Section  * 

Suggest-offer : Course 
THEN  may-teach  (TA,  Course) 

R7  A graduate  student  may  teach  an  undergraduate  course 
(i.e.,  c#  < 5000  ) if  he/she  has  taken  the  course  and 

received  a grade  of  B or  more. 

IF  CONTEXT  Grad  * Transcript  [grade  =>  ' B’]  * 

Course  [c#  < 5000] 

THEN  may-teach  (Grad,  Course) 

Rules  R6  and  R7  derive  extensional  patterns  into  the  same 
subdatabase  (May-teach)  but  based  on  different 
conditions.  Thus,  if  both  rules  are  applied,  May-teach 
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will  contain  the  union  of  the  two  sets  of  extensional 
patterns  derived  by  the  two  rules.  Figure  5.2  shows  the 
derivation  diagram  for  the  subdatabase  May-teach. 

5.3  Rules  that  Share  a Common  Context 

In  case  when  a set  of  rules  share  some  common 
context  (i.e.,  they  are  to  be  applied  to  some  specific 
subdatabase  of  the  original  database),  the  CONTEXT  clause 
of  OQL  and  its  WHERE  subclause  are  used  to  establish  such 
a common  context  first.  If  any  of  the  rules  that  are 
defined  under  the  specified  context  does  not  include  any 
more  restrictions  besides  those  stated  in  the  Context, 
the  rule  is  defined  as  a predicate  that  has  the  form: 
"subdatabase-id  (classes)",  i.e.,  without  an  If  clause. 
The  key  words  RULE-WINDOW  and  END  RULE-WINDOW  are  used  to 
delimit  a common  context  and  the  associated  set  of  rules. 
Example 

RULE-WINDOW  W1 

CONTEXT  TA  * Teacher  * Section  * Course  * 

Department  [name  = ' EE ' ] ; 

EE-TA  (TA,  Course); 

If  (classification  < 8)  AND  (c#  =>  5000) 

THEN  Weak-sections  (Section); 

END  RULE-WINDOW; 

Two  rules  are  defined  in  the  above  rule  window.  The 
first  rule  is  the  predicate  EE-TA  (TA,  Course),  which 
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derives  the  subdatabase  EE-TA  from  the  common  subdatabase 
defined  by  the  CONTEXT  clause.  The  intensional  pattern  of 
this  subdatabase  consists  of  the  two  classes  TA  and 
Course  and  a derived  aggregation  association  between 
them.  Since  this  rule  has  no  If  clause,  the  extensional 
patterns  of  the  subdatabase  it  derives  include  all  the  TA 
and  Course  pairs  of  instances  that  satisfy  the  conditions 
specified  in  the  CONTEXT  clause  of  the  rule  window  (i.e., 
the  TAs  who  teach  some  sections  of  EE  courses  together 
with  these  courses).  On  the  contrary,  the  subdatabase 
Weak-sections,  which  is  derived  by  the  second  rule,  does 
not  include  all  the  Section  instances  that  exist  in  the 
Context  subdatabase.  The  conditions  stated  in  the  If 
clause  of  the  second  rule  are  first  applied  to  the 
Context  subdatabase  to  produce  a new  subdatabase  with  the 
set  of  extensional  patterns  that  satisfy  these 
conditions.  The  Section  instances  that  appear  in  these 
extensional  patterns  become  the  instances  of  the  Weak- 
sections  subdatabase. 

Defining  a set  of  rules  under  a common  context  is  a 
mechanism  that  achieves  modularity  and  saves  on  the 
amount  of  code  to  be  written.  These  two  advantages  are 
very  important  in  large  knowledge  bases.  The  alternative 
to  identifying  a context  that  is  common  to  a set  of  rules 
is  to  duplicate  the  conditions  specified  in  the  common 
context  in  the  If  clause  of  each  of  the  rules. 


105 


5.4  Querying  the  Derived  Data 

Once  the  deductive  rules  that  derive  new 
subdatabases  are  defined,  the  classes  of  the  derived 
subdatabases  can  be  referenced  in  association  pattern 
expressions  in  any  OQL  query  in  the  normal  way.  The 
association  operator  can  be  used  between  any  two  classes 
even  if  they  belong  to  two  different  subdatabases  if 
there  is  an  association  (possibly  inherited)  between 
them.  For  example,  the  following  is  an  OQL  query  that 
operates  on  the  classes  Faculty  and  Advising  from  the 
original  database  and  the  class  TA  from  the  subdatabase 
May-teach . 

Query  For  the  teaching  assistants  who  may  teach  a course 
in  the  next  semester,  have  advisors,  and  whose  GPA's  are 
less  than  3.5,  display  their  names  and  their  advisors' 
names . 

CONTEXT  Faculty  * Advising  * 

May-Teach:TA  [GPA  < 3.5] 

DISPLAY  TA  [name],  Faculty  [name] 

The  class  May-teach:TA  is  a subclass  of  the  base  class 
TA,  which  in  turn  is  a subclass  of  the  class  Grad.  Thus, 
May-teach:TA  inherits  the  aggregation  association  with 
Advising  from  Grad  and  consequently  the  expression 
"Advising  * May-teach : TA"  in  the  above  query  is  legal. 

In  the  backward  chaining  strategy  (backward  and 
forward  chaining  strategies  are  described  in  Chapter  7), 
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this  query  is  evaluated  as  follows.  Since  TA  is 
referenced  in  the  query  in  the  context  of  May-teach, 
rules  R6  and  R7  will  be  triggered  for  execution  to  derive 
the  subdatabase  May-teach.  But,  in  order  to  derive  May- 
teach,  the  subdatabase  Suggest-of f er  (which  is  referenced 
by  rule  R6 ) must  be  derived.  This  causes  rule  R4  that 
derives  Suggest-of fer  to  be  triggered  for  execution.  R4 
does  not  refer  to  any  derived  subdatabase,  hence,  the 
expressions  of  R4  are  evaluated  against  the  base  classes 
(similarly,  the  expressions  of  R7  are  evaluated  against 
the  base  classes).  The  result  is  then  fed  to  rule  R6  that 
participates  in  deriving  May-teach.  The  subdatabase  May- 
teach  is  then  used  to  evaluate  the  given  query. 

5.5  Transitive  Closure 

The  transitive  closure  operation  is  expected  to  be  a 
fundamental  operation  in  future  database  systems  and 
knowledge  systems.  The  transitive  closure  operation  is 
performed  in  our  language  by  iterating  over  some  classes 
and  associations  that  form  a cycle.  Let  A,  B and  C be 
three  classes  in  a schema  that  form  a cycle,  i.e.,  A has 
associations  with  both  B and  C and  B has  an  association 
with  C.  The  following  rule  derive  a subdatabase  X that 
contains  pairs  of  instances  of  the  class  A that  are 
associated  with  each  other  via  some  B and  C instances 
(This  is  similar  to  Query  3 in  Chapter  4).  In  other 
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words,  each  extensional  pattern  in  this  subdatabase 
contains  two  objects  that  are  both  drawn  from  the  class 
A. 

IF  CONTEXT  A * B * C * A_1 

THEN  X (A,  A_l) 

By  extending  the  Context  expression  in  this  rule  to 
form  a second  iteration,  i.e.,  the  expression  " [A  * B * C 
* A_l]  * B_1  * C_1  * A_2",  one  can  derive  a subdatabase 
that  contains  a three-level  hierarchy  of  A instances.  The 
braces  are  used  in  the  expression  (Chapter  4)  to  keep  the 
first  and  second  level  A instances  that  are  associated 
with  each  other  even  if  they  are  not  associated  with 
third  level  A instances. 

This  iteration  can  be  expressed  in  our  language  by 
adding  the  "(B"  sign  as  a superscript  at  the  end  of  an 
association  pattern  expression  that  forms  a cycle.  An 
optional  number  N (Number  of  iterations)  following  the  (9- 
sign  causes  the  underlying  system  to  traverse  the  cycle  N 
times.  (For  a certain  hierarchy  of  instances,  iteration 
stops  when  Null  values  are  encountered  or  at  the  N^h 
iteration,  i.e.,  N^h  descendant  from  the  root  of  the 
hierarchy.)  If  such  a number  is  not  present,  the  cycle  is 
traversed  until  Null  values  are  obtained  for  all  the 
hierarchies  of  instances  (i.e.,  the  transitive  closure 
operation  is  performed).  Using  this  technique,  the  two 
expressions  " [A  * B * C * A_1 } * B_1  * C_1  * A_2"  and 
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"{{A  * B * C * A_1 } * B_1  * C_1  * A_2 } * B_3  * C_3  * A_3 " 
are  represented  as  (A  * B * C)@2  and  (A  * B * C)@3, 
respectively.  This  approach  allows  for  representing  the 
transitive  closure  operation  in  the  form  of  looping 
rather  than  in  a recursive  form  which  is  a more  user 
friendly  representation.  The  following  are  two  example 
rules . 

R8  a graduate  student  may  be  taught  by  other  graduate 
students  in  some  sections  and  also  may  teach  other 
graduate  students  in  some  other  sections.  Derive  the 
Gr ad-teaching-grad  hierarchy.  (It  is  assumed  here  that 
the  relationship  between  the  instances  of  the  class  Grad 
is  not  cyclic,  i.e.,  if  Grad  gl  teaches  Grad  g2  or 
teaches  any  of  the  teachers  of  g2,  then  g2  cannot  teach 
gl  nor  any  of  the  teachers  of  gl). 

IF  (Grad  * TA  * Teacher  * Section  * Student)^ 

THEN  Grad-teaching-grad  (Grad,  Grad_(9) 

The  intensional  pattern  of  the  resulting  subdatabase 
Grad-teaching-grad  consists  of  the  classes  Grad,  Grad_l, 
Grad_2 , ...  until  Null  values  in  all  the  hierarchies  are 
encountered  (the  second  argument  to  Grad-teaching-grad, 
i.e.,  Grad_(3,  stands  for  Grad_l , Grad_2,  ...).  In  other 
words,  the  intensional  pattern  of  the  resulting 
subdatabases  is  determined  at  run  time. 

R9  Derive  a subdatabase  which  contains  only  the  1st  level 
and  3rd  level  in  the  grad-teaching-grad  hierarchy. 
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IF  (Grad  * TA  * Teacher  * Section  * Student)®2 
THEN  f irst-and-third  (Grad,  Grad_2) 

5.6  Conclusion 

In  this  Chapter,  we  introduced  the  component  of  the 
knowledge  definition  language,  which  facilitates  defining 
deductive  rules  for  00  databases.  This  component  has  the 
following  advantages. 

(1)  A derived  subdatabase  has  a structure  that  is 
similar  to  the  original  subdatabase  (i.e.,  it 
consists  of  a set  of  classes  and  their  associations 
together  with  a set  of  instances  for  each  class), 
therefore,  derived  subdatabases  and  the  original 
database  can  be  uniformly  used  (i.e. , by  the  same 
language  constructs)  to  further  derive  other 
subdatabases . 

(2)  Modularity  in  the  design,  which  is  especially  useful 
in  very  large  databases,  can  be  achieved  by  grouping 
a set  of  rules  that  share  a common  context  in  a 
single  rule-window. 

(3)  The  transitive  closure  operation  is  supported  in  our 
language.  This  operation  is  not  represented  in  a 
recursive  form  but  rather  in  the  form  of  looping, 
which  is  a simpler  representation  especially  for  end 


users . 


110 


Figure 


Teacher-course 


Teacher 


Course 


5.1a:  Intensional  Pattern  of  a 
Derived  Subdatabase 


Teacher  Course 

r\  £\ 

/ 1 1 

cl 

1 2 

c2 

V7 

5.1b:  The  Extensional  Diagram  Corresponding  to 
Figure  5.1a 


5.1:  The  Intensional  Pattern  and  Extensional  Diagram 
for  a Subdatabase  that  is  Derived  by  a Rule 


Ill 


May-teach 


Figure  5 . 2 : The  Derivation  Diagram 
for  the  Subdatabase  May-teach 


CHAPTER  6 

INTEGRITY  CONSTRAINTS  IN  00  DATABASES 
6.1  Introduction 

For  a database  to  be  an  accurate  model  of  an 
application  world,  constraints  (integrity  rules)  that 
identify  the  invalid  (illegal)  states  pertaining  to  the 
real  world  entities  and  their  relationships  need  to  be 
captured  in  the  database.  Unlike  in  the  area  of  00 
databases,  considerable  research  efforts  on  constraints 
pertaining  to  relational  databases  have  been  reported  in 
the  literature  [HEL76,  BUN79,  DAT 81 , SIM87], 

Existing  00  data  models  support  the  definition  of 
only  a limited  set  of  constraint  types  such  as  those 
pertaining  to  the  objects  of  a single  class  or  those 
relating  objects  of  two  directly  associated  classes 
(e.g.,  cardinality  constraints).  Also,  none  of  the 
existing  implemented  00  DBMSs  (e.g.,  GEMSTONE  [ SER86 ] , 
Vbase  [ONT87]  , and  IRIS  [FIS87]  ) has  an  adequate 
constraint  specification  and  management  component. 

In  an  00  database,  the  capability  to  specify 
constraints  that  prevent  the  existence  of  objects  in 
certain  patterns  of  associations  needs  to  be  provided. 
Also,  if  some  objects  appear  (or  do  not  appear)  in 
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certain  association  patterns,  constraints  that  state  that 
these  objects  must  (or  must  not)  exist  in  some  other 
specified  association  patterns  may  need  to  be  defined.  In 
this  chapter,  we  study  the  semantics  of  integrity 
constraints  in  00  databases  and  describe  the  mechanisms 
and  language  constructs  for  defining  such  constraints. 

6.2  Constraints  and  First  Order  Logic 

Though  a high-level  user  friendly  constraint 
specification  language  can  be  provided,  the  underlying 
principles  of  such  a language  can  be  found  in  first  order 
logic.  The  basic  building  block  normally  used  for 
defining  integrity  rules  is  the  Predicate  (an  expression 
that  returns  TRUE  or  FALSE).  In  its  simplest  form,  an 
integrity  rule  consists  of  a single  predicate.  The 
logical  connectives  AND,  OR,  NOT,  and  IMPLIES  ( — >)  can 
be  used  between  predicates.  A predicate  may  be  a scalar 
or  a set  comparison  predicate.  Scalar  comparison 
predicates  are  =,  ! = , >,  <,  > = , and  <=.  Examples  of  set 
comparison  predicates  are  EQUAL,  NOT-EQUAL,  SUBSET-OF, 
SUPERSET-OF,  IN,  INCLUDE. 

These  predicates  can  be  specified  in  the  CONTEXT 
clause  of  OQL  and  its  WHERE  subclause  in  addition  to 
specifying  the  association  conditions  among  objects 
(i.e.,  using  the  association  and  n on- a s s oc i a t i on 
operators)  as  described  in  Chapter  4.  Therefore,  in  the 
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following  sections  we  shall  use  the  CONTEXT  clause  and 
its  WHERE  subclause  as  part  of  our  constraint 
specification  language. 

We  represent,  the  logical  connective  IMPLIES  in  the 
form  of  IF- THEN- ELSE  rules.  For  example,  the  expression 
PI  -->  P2  is  represented  as  IF  PI  THEN  P2  . The  two 
expressions  PI  — > P2  and  NOT  (PI)  — > P3  are  represented 
as  IF  PI  THEN  P2  ELSE  P3,  where  PI,  P2,  and  P3  are 
predicates . 

6 . 3 Constraints  on  Association  Patterns 

In  this  section,  we  identify  and  illustrate  with  some 
examples  the  different  kinds  of  constraints  that  can  be 
specified  on  the  association  patterns  that  objects  may 
fall  into.  These  constraints  can  be  classified  into  two 
categories.  First,  constraints  which  Specify  non- 
permissible  extensional  pattern  types  of  object 
associations  (extensional  pattern  types  are  defined  in 
Chapter  4).  Objects  in  the  database  cannot,  at  any  point 
in  time,  be  interrelated  in  a way  that  forms  extensional 
patterns  of  any  of  these  non-permissible  pattern  types . 
Second,  constraints  which  specify  that  if  certain  objects 
exist  (or  do  not  exist)  in  extensional  patterns  of  a 
certain  type,  these  objects  must  (or  must  not)  exist  in 
extensional  patterns  of  some  other  type.  The  following 
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two  subsections  describe  each  of  these  two  kinds  of 
constraints  with  some  detail. 

6.3.1  Non-permissible  Extensional  Pattern  Types 

The  following  is  the  syntax  for  constraints  that 
specify  non-permissible  extensional  pattern  types  of 
object  associations. 

NOT  EXIST  (CONTEXT  association  pattern  expression 

WHERE  conditions) 

In  case  when  the  optional  Where  clause  is  not  provided, 
we  allow  omitting  the  key  word  CONTEXT  to  get  the 
following  simpler  form. 

NOT  EXIST  (association  pattern  expression) 

This  type  of  constraints  states  that  no  extensional 
association  pattern  that  belongs  to  the  extensional 
pattern  type  identified  by  the  association  pattern 
expression  given  in  the  constraint  should  exist  in  the 
database  at  any  time.  Stated  differently,  the  association 
pattern  expression  of  a constraint  of  this  kind  defines  a 
subdatabase  and  the  NOT  EXIST  clause  states  that  this 
subdatabase  should  always  be  empty  (i.e,  has  no 
extension).  Updates  issued  against  the  database  should 
not  cause  the  formation  of  any  extensional  patterns  that 
satisfy  the  conditions  based  on  which  the  subdatabase  is 
defined,  otherwise,  the  database  will  enter  an 
inconsistent  state.  The  idea  of  treating  constraints  as 
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view  (subdatabase)  definitions  and  monitoring  the  tuples 
(or  "extensional  patterns"  in  our  case)  which  enter  the 
view  as  a result  of  some  database  updates  has  been 
reported  in  [BUN79]  for  the  relational  data  model. 
Examples 

(1)  A faculty  member  must  have  a Ph.D.  degree.  In  other 
words,  no  faculty  member  may  have  other  than  the 
Ph.D.  degree  as  his/her  highest  degree. 

NOT  EXIST  (Faculty  [degree  <>  "Ph.D"]) 

(2)  An  undergraduate  student  cannot  be  registered  in  a 
graduate  course. 

NOT  EXIST  (Undergrad  * Section  * Course  [c#  > 
5000] ) 

The  database  operations  that  may  cause  a pattern  of 
the  type  identified  by  the  association  pattern  expression 
in  the  above  constraint  to  be  formed  (and  therefore 
violating  the  constraint)  are:  updating  the  c#  attribute 
of  Course,  associating  a Section  and  Course  instances,  or 
associating  an  Undergrad  and  Section  instances.  Note  that 
associating  an  Undergrad  and  Section  instances  may  in 
turn  be  caused  by  associating  a Student  and  Section 
instances  and/or  a Student  and  Undergrad  instances 
(Undergrad  inherits  the  association  with  Section  from  its 
superclass  Student). 

(3)  A graduate  student  majoring  in  any  department  of  the 
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college  of  engineering  can  be  either  a TA  or  an  RA 
but  not  both  at  the  same  time. 

NOT  EXIST  (Department  [college  = "engineering"] 

*Major  * Grad  * AND  (TA,  RA) ) 
Figure  6.1a  shows  the  different  extensional  pattern 
types  that  may  be  formed  in  the  database  and  that  involve 
instances  of  the  classes  Department,  Grad,  TA,  and  RA. 
Figure  6 . lb  shows  the  non-permissible  extensional  pattern 
type  identified  by  the  association  pattern  expression  in 
this  constraint. 

4 . A student  cannot  take  a course  that  he/she  has  taken 
before . 

NOT  EXIST  (Student_l  * Section  * Course  * 
Transcript  * Student_l) 

This  constraint  states  that  no  extensional  pattern,  which 
forms  a cycle  that  starts  at  a particular  student 
instance  and  goes  back  to  the  same  student  instance 
passing  through  Section,  Course,  and  Transcript  instances 
is  permitted  in  the  database.  Note  that  both  occurrences 
of  the  class  name  Student_l  in  the  above  expression  will, 
at  any  given  time,  be  bound  to  the  same  student  instance. 
6.3.2  Conditional  Constraints 

A constraint  in  this  category  specifies  whether 
objects  of  certain  classes  must  or  must  not  exist  in 
extensional  patterns  of  a given  subdatabase  if  these 
objects  appear  (or  do  not  appear)  in  extensional  patterns 
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of  some  other  subdatabases.  These  Constraints  are 
specified  in  the  form  of  IF-THEN- ELSE  rules  that  have  the 
following  structure  (the  ELSE  clause  is  optional). 

IF  NOT  EXIST  (subdatabase-definition  1) 

THEN  NOT  EXIST  (subdatabase-definition  2) 

ELSE  NOT  EXIST  (subdatabase-definition  3) 

Where  the  underlined  key  words  are  optional  and  at  least 
one  of  the  classes  referenced  in  the  subdatabase 
definition  of  the  IF  clause  must  also  be  referenced  in 
the  subdatabase  definitions  of  the  THEN  and  ELSE  clauses 
(i.e.,  if  the  optional  ELSE  clause  is  used).  The 
following  are  some  constraints  of  this  kind  (in  these 
constraints  we  omit  the  key  word  CONTEXT  from  the 
definition  of  a subdatabase  in  case  if  such  a definition 
does  not  contain  any  WHERE  or  SELECT  subclauses). 

Examples 

(1)  A graduate  student  who  is  an  RA  must  have  an  advisor, 
i.e.,  if  a Grad  instance  exists  in  an  extensional 
association  pattern  with  an  RA  instance,  it  must  also 
exist  in  an  extensional  association  pattern  with  an 
instance  of  the  class  Advising.  Stated  differently, 
the  set  of  graduate  students  who  are  RA ' s should  be 
at  any  point  in  time  a subset  of  the  set  of  graduate 
students  who  have  advisors. 

IF  Grad  * RA 


THEN  Grad  * Advising 
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Since  Grad  appears  in  both  the  IF  and  THEN  clauses,  this 
rule  can  be  read  as  follows:  If  a Grad  instance  appears 

in  any  extensional  pattern  of  the  subdatabase  defined  by 
the  IF  clause,  it  must  also  appear  in  at  least  one 
extensional  pattern  of  the  subdatabase  defined  by  the 
THEN  clause. 

2.  TA's  who  are  majoring  in  the  department  of  Electrical 
Engineering  cannot  teach  courses  that  belong  to  other 
departments . 

IF  TA  *Major  Department_l  [name  = "EE"] 

THEN  NOT  EXIST  ( TA  * Teacher  * Section  * Course 

* Department_2  [name  <>  "EE"]) 
This  constraint  can  be  read  as  follows.  If  a TA 
instance  exists  in  any  extensional  pattern  of  the 
subdatabase  defined  by  the  IF  clause,  it  should  not  at 
the  same  time  exist  in  any  extensional  pattern  of  the 
subdatabase  defined  by  the  THEN  clause. 

3.  A faculty  member  who  is  not  an  advisor  of  any 
graduate  student  must  teach  at  least  one  section. 

IF  NOT  EXIST  (Faculty  * Advising) 

THEN  Faculty  * Section 

This  constraint  states  that  if  a Faculty  instance 
does  not  exist  in  any  extensional  pattern  of  the 
subdatabase  defined  by  the  IF  clause,  it  must  exist  in  at 
least  one  extensional  pattern  of  the  subdatabase  defined 
by  the  THEN  claus.  The  database  operations  that  could 
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cause  this  constraint  to  be  violated  and  therefore 
transform  the  database  into  an  inconsistent  state  are: 
dissociating  a Faculty  and  Advising  instances  or 
dissociating  a Faculty  and  Section  instances. 

6.4  Specifying  Constraints  on  Specific  Subdatabases 

One  or  more  constraints  can  be  specified  on  a certain 
subdatabase.  In  this  case,  the  subdatabase  is  defined 
first  followed  by  the  cons traint  ( s ) . Only  the  objects 
that  appear  in  the  subdatabase  must  satisfy  these 
constraints.  The  key  words  RULE-WINDOW  and  END  RULE- 
WINDOW  are  used  to  delimit  the  Context  expression  and  the 
associated  set  of  constraints. 

Example 

The  GPA  of  a graduate  student  majoring  in  any  department 
of  the  college  of  engineering  must  be  greater  than  3.3. 
Also,  if  such  a graduate  student  is  a TA,  he/she  must 
have  an  advisor. 

RULE-WINDOW 

CONTEXT  Grad  *Major  Department  [college  = 
"engineering" ] 

(GPA  > 3.3); 

IF  (Grad  * TA) 

THEN  (Grad  * Advising); 


END  RULE-WINDOW 
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The  Context  expression  defines  a subdatabase.  The 
first  constraint  (i.e.,  the  predicate  "GPA  > 3.3")  states 
that  any  Grad  instance  that  appears  in  any  of  the 
extensional  patterns  of  this  subdatabase  must  have  a GPA 
greater  than  3.3.  The  second  constraint  states  that  if  a 
Grad  instance  that  appear  in  this  subdatabase  is  a TA , 
he/she  must  have  an  advisor.  Grad,  in  the  second  rule,  is 
said  to  be  implicitly  qualified  by  the  Context 
subdatabase  (as  opposed  to  explicit  qualification  such  as 
X:Grad).  This  is  because  Grad  is  referenced  within  this 
Context.  If  the  database  is  updated  such  that  a new  Grad 
instance  enters  into  the  subdatabase  defined  in  this  rule 
window  (i.e.,  after  the  update,  the  Grad  instance 
satisfies  the  conditions  based  on  which  this  subdatabase 
is  defined),  then  it  should  also  satisfy  the  two 
specified  constraints. 

6.5  Cardinality  Constraints 

The  mapping  constraints  1:1,  n:l,  and  l:n  can  be 
specified  between  any  two  directly  associated  classes. 
The  default  constraint  is  m:n.  In  this  section,  we 
describe  a new  kind  of  mapping  constraints,  that  is, 
constraints  between  two  indirectly  associated  classes  or 
between  a set  of  classes  and  another  set. 

Let  A,  B,  and  C be  three  classes  in  a schema  where 
each  of  the  pairs  A,B  and  B,C  is  directly  associated.  The 
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mapping  specified  between  A and  B and  between  B and  C 
implies  that  a certain  mapping  exists  between  A and  C 
along  the  path  A,B,C.  The  following  shows  some  of  the 
possible  A:B  and  B:C  mappings  and  the  implied  A:C 
mapping . 


A:  B 

B : C 

> A : C 

1:1 

1:1 

1:1 

1 : n 

1:1 

1 : n 

1 : n 

n : 1 

n : m 

1 : n 

1 : n 

1 : n 

n : m 

1 : n 

n : m 

Once  the  mapping  A:C  is  known,  the  mapping  A:D,  where  D 
is  a class  that  is  directly  associated  to  C,  can  also  be 
inferred.  In  general,  the  mapping  between  any  two 
indirectly  associated  classes,  say  H and  I,  along  a 
certain  path  can  be  inferred  if  the  mapping  between  every 
two  directly  associated  classes  in  the  path  between  H and 
I is  given.  We  call  the  mapping  between  any  two  directly 
associated  classes  as  local  mapping  while  the  mapping 
between  any  two  indirectly  associated  classes  along  a 
certain  path  as  global  mapping. 

In  some  real  world  situations,  it  may  be  necessary 
that  the  global  mapping  between  two  classes  along  a 
certain  path  be  more  restrictive  than  what  is  implied  by 
the  local  mappings  along  the  path  (a  less  restrictive 
mapping  is  not  meaningful  nor  possible).  We  need  to 
provide  the  capability  to  capture  such  a constraint.  This 
facility  is  not  currently  supported  by  any  of  the 
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existing  00  models  and  systems.  For  example,  in  the 
schema  of  Figure  3.1  if  the  following  mappings  exist. 
Teacher : Sect  ion  = l:n  and  Sect  ion : Course  = n:l.  This 
implies  that  the  mapping  Teacher : Course  = n:m.  The  real 
world  semantics  may  also  state  that  a teacher  can  teach 
many  sections  but  all  of  them  should  belong  to  a single 
course  while  the  sections  of  a course  can  be  taught  by 
several  teachers.  In  other  words,  the  mapping 
Teacher : Course  along  the  path  that  goes  through  Section 
should  be  n:l.  This  global  mapping  constraint  is  more 
restrictive  than  what  is  implied  by  the  local  mapping 
constraints  (which  is  n:m). 

The  above  constraint  between  Teacher  and  Course  can 
be  specified  as 

CONTEXT  Teacher  * Section  * Course 
MAPPING  Teacher : Course  = n:l 

This  constraint  can  be  viewed  as  a constraint  on  the 
patterns  of  object  associations.  In  this  sense,  it  states 
that  if  a Teacher  instance  exists  in  any  two  extensional 
patterns  of  those  identified  by  the  Context  expression, 
then  these  two  extensional  patterns  should  contain  the 
same  course  instance  (but  not  necessarily  the  same 
Section  instance). 

If  the  mapping  between  Teaching  Assistants  (and  not 
Teacher  in  general)  and  Course  should  be  n:l,  then  the 
above  constraint  can  be  specified  as  follows. 
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CONTEXT  TA  * Teacher  * Section  * Course 
MAPPING  TA:Course  = n:l 

or  as 

CONTEXT  TA  * Teacher  * Section  * Course 
MAPPING  Teacher : Course  = n:l 

Intra-class  conditions  can  be  specified  following  a 
class  name  in  the  Context  expression  and  inter-class 
conditions  can  be  specified  in  the  WHERE  subclause.  When 
such  conditions  are  stated,  the  mappings  specified  apply 
only  to  the  objects  that  satisfy  these  conditions  and 
participate  in  extensional  patterns  of  the  specified 
types . 

Using  this  technique,  more  restrictive  local  mapping 
constraints  can  also  be  specified.  For  example,  if  the 
mapping  between  Section  and  Course  is  n:l,  a 1:1  mapping 
between  instances  of  Section  and  graduate  courses 
(courses  with  c#'s  >=  5000),  which  means  that  only  one 
section  of  each  graduate  course  can  be  offered  at  any 
particular  semester,  can  be  stated  as  follows. 

CONTEXT  Section  * Course  [c#  >=  5000] 

MAPPING  Section  : Course  = 1:1 

In  some  real  wold  situations,  a mapping  constraint 
may  exist  between  a pair  of  classes  and  a single  class. 
For  example,  in  Figure  3.1,  let  the  mappings 
Student  : Section  and  Sec  tion : Course  be  m:n  and  n:l, 
respectively.  A real  world  constraint  may  state  that  a 
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student  cannot  be  registered  in  more  than  one  of  the 
sections  of  any  particular  course.  In  other  words,  the 
mapping  ( Student , Course ) : Section  must  be  n:l.  That  is, 
any  student-course  pair  can  have  only  one  corresponding 
section  while  a section  can  have  several  student-course 
pairs.  Such  a constraint  cannot  be  captured  by  existing 
00  models  and  the  languages  based  on  them.  Using  our 
approach,  this  constraint  is  defined  as  follows. 

CONTEXT  Student  * Section  * Course 

MAPPING  ( Student , Course ): Section  = n:l 
In  general,  a mapping  between  two  sets  of  classes  can  be 
defined . 

6 . 6 Trigger  Conditions  and  Corrective  Actions 

The  complete  structure  of  a rule  definition  is  as 
follows,  where  the  key  words  are  written  in  upper  case 
characters . 

RULE  Rule-id 

TRIGGER-COND  ( Trigger-time , Operation  pairs) 
Rule-Body 

CORRECTIVE-ACTION  actions 

END  Rule-id 

The  Rule-id  is  a unique  number  (or  name)  to  be  given 
to  a rule.  A sequence  of  < Trigger-time , 0peration>  pairs 
are  optionally  specified  in  the  Trigger-conditions 
(TRIGGER-CON)  clause.  Each  of  these  pairs  specify  an 
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operation  which  causes  the  rule  to  be  triggered  and  the 
triggering  time  with  respect  to  the  time  of  executing  the 
operation.  The  different  execution  times  are  BEFORE, 
AFTER,  and  PARALLEL.  The  Rule-body  is  the  actual 
integrity  rule  and  it  is  defined  as  described  in  the 
previous  sections  of  this  chapter.  Actions  that  are  to  be 
taken  if  the  rule  is  violated  are  specified  in  the 
Corrective-action  clause.  These  actions  may  include 
sending  an  error  message  to  the  user,  rejecting  the 
operation  that  caused  the  rule  to  be  triggered,  or 
further  updating  the  database  to  bring  it  back  to  a 
consistent  state.  Rejecting  the  update  operation  that 
violates  the  constraints  is  the  default  option. 

The  following  is  the  complete  definition  of  a rule 
that  states  that  a TA  whose  classification  is  less  than  8 
cannot  teach  a graduate  course  (i.e.,  with  c#  >=  5000). 
The  corrective  action  of  this  rule  is  a message  sent  to 
the  user. 

RULE  RIO 

TRIGGER-COND  (After  Update  ( TA  [classification]), 

After  Update  (Course  [c#]), 

After  Associate  (Teacher,  Section), 
After  Associate  (Section,  Course) ; 

NOT  EXIST  ( TA  [classification  <=  7]  * Teacher  * 
Section  * Course  [c#  >=  5000]  ) 

CORRECTIVE-ACTION  Message  ("TA's  Classification  <=  7, 
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should  not  teach  a graduate  course" ) 

END  RIO 

In  this  approach,  it  is  the  database  designer's 
responsibility  to  specify  in  the  Trigger-cond  clause  of  a 
rule  all  the  database  operations  relevant  to  the  rule, 
which  may  cause  the  database  to  enter  an  inconsistent 
state.  This  responsibility  can  be  transferred  to  a 
subsystem,  which  infers  all  possible  trigger  conditions 
relevant  to  a rule  by  checking  the  rule  body. 

6.7  Conventional  Database  Constraints 

In  this  section,  we  show  how  conventional  database 
constraints  can  be  uniformly  specified  as  constraints  on 
patterns  of  object  associations.  We  will  consider  the 
Non-Null  and  Totality  constraints  as  examples.  Though 
high-level  key  words  can  be  used  to  specify  the 
frequently  used  types  of  constraints  (such  as  the 
Totality  and  Non-null  constraints),  internally,  these 
constraints  need  to  be  represented  in  the  form  of  rules 
using  the  syntax  introduce  in  this  Chapter.  This  makes  it 
possible  for  the  integrity  rule  subsystem  to  interpret 
and  enforce  these  constraints. 

Non-Null  Constraints:  In  the  definition  of  the  class 
Transcript  (see  Figure  3.1),  the  attributes  Student  and 
Course  can  be  specified  to  be  Non-Null.  This  means  that 
every  Transcript  instance  must  be  associated  with  both  a 
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Student  instance  and  a Course  instance.  This  constraint 
can  be  expressed  using  our  language  as  follows  (where  all 
the  different  occurrences  of  the  class  name  Transcript  in 
the  rule  can  be  bound,  at  any  given  time,  to  the  same 
Transcript  instance) . 

IF  Transcript 

THEN  Transcript  * Student  AND  Transcript  * 
Course 

Totality  Constraints:  An  attribute  of  a class  can  be 
constrained  to  be  Total,  which  means  that  every  object  in 
the  domain  of  the  attribute  must  participate  as  an 
attribute  value  of  some  instance  of  the  class.  For 
example,  if  every  department  must  offer  some  courses,  the 
attribute  Department  of  Course  is  said  to  be  total.  Such 
a constraint  can  be  defined  in  the  schema  as  part  of  the 
definition  of  the  object  class  Course  (see  Appendix  B)  as 
follows  (assuming  that  TOTAL  is  a key  word  in  the 
language ) . 

ENTITY_CLASS  Course; 


ASSOCIATION-SECTION 


AGGREGATION  OF  (Department  (TOTAL),  ) 


END  ASSOCIATION-SECTION 


END  Course; 
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This  constraint  needs  to  be  translated  by  the  system  to 
the  following  form  in  order  for  it  to  be  processed 
uniformly  by  the  underlying  rule  subsystem. 

IF  Department 

THEN  Department  * Course 

Similar  to  the  totality  and  Nun-null  constraints,  set 
constraints  on  the  subclasses  of  a class  can  also  be 
expressed  as  constraints  on  object  association  patterns. 
For  example,  Constraint  3 in  Section  3.1  above  defines  a 
set-exclusion  constraint  between  the  two  subclasses  of 
Grad  (only  for  the  graduate  students  who  are  majoring  in 
the  'EE'  department). 


6 . 8 Conclusion 

We  have  studied  and  defined  the  semantics  of 
constraints  in  00  databases.  Since  an  00  database  can  be 
viewed  as  a set  of  patterns  of  object  associations, 
constraints  can  be  defined  to  prevent  the  existence  of 
certain  patterns  of  associations  among  objects  in  the 
database.  Also,  constraints  that  state  that  certain 
objects  should  (or  should  not)  exist  in  some  patterns  if 
they  exist  (or  do  not  exist)  in  some  other  patterns  can 
be  defined  using  our  approach. 

A set  of  constraints  can  share  a common  subdatabase, 
in  this  case,  the  subdatabase  is  defined  first  followed 
by  the  constraints.  If  the  database  is  updated  such  that 
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a new  instance  enters  into  the  subdatabase  (because  it, 
after  the  update,  satisfies  the  conditions  used  to  define 
the  subdatabase),  that  instance  has  to  satisfy  all  the 
constraints  specified  for  the  subdatabase. 

A mechanism  for  defining  mapping  constraints  between 
instances  of  different  classes  even  if  these  classes  are 
not  directly  associated,  has  also  been  described  in  this 
chapter. 
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6 . la : Possible  Extensional 
that  Include  Grads  who  may  be 


Pattern  Types 
RA's  and/or  TA's 


6.1b:  A Non-Permissible  Extensional 
Pattern  Type  as  Specified  by  a Constraint 


Figure  6.1:  A Graphical  Representation  ot  the 
Effect  of  a Constraint 


CHAPTER  7 

THE  IMPLEMENTATION  OF  AN  00  KNOWLEDGE 
BASE  MANAGEMENT  SYSTEM 

Several  efforts  have  contributed  to  the  development 
of  an  00  knowledge  base  management  system  (OKBMS)  at  the 
Database  Systems  Research  and  Development  Center  in  the 
University  of  Florida  [XIA89,  DS088,  TY88,  WU89]  . The 
overall  architecture  of  this  system  is  shown  in  Figure 

7.1. 

In  section  7.1  of  this  chapter,  we  describe  the 
overall  implementation  strategy.  The  current 
implementation  status  is  briefly  described  in  Section 

7.2.  Next,  we  describe  some  possible  implementation 
techniques  for  some  of  the  components  of  this  system  such 
as  the  rule  subsystem,  i.e.,  deductive  and  integrity 
rules  (Sections  7.3  and  7.4),  knowledge-based  query 
optimization  (Section  7.5). 

7.1  An  Overview 

In  this  section,  we  describe  the  overall  framework 
for  implementing  an  00  knowledge  management  system.  The 
following  are  some  of  the  modules  or  functionalities  that 
are  needed  to  implement  such  a system. 
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(1)  The  high-level  user  queries  need  to  be  mapped  to 
query  trees.  A query  tree  is  an  internal  executable 
form  in  which  each  node  represents  an  operation  that 
takes  one  or  more  inputs  and  produces  an  output.  The 
output  of  a node  is  passed  up  as  an  input  to  another 
node  in  the  query  tree  (unless  it  is  the  final 
result).  The  query  tree  represents  a schedule  for 
executing  some  operations  that  are  necessary  to 
evaluate  the  user'  query. 

(2)  Semantic  (knowledge-based)  and  conventional  query 
optimization  techniques  need  to  be  performed  on  the 
query  tree  to  produce  an  optimized  query  tree,  which 
can  be  evaluated  with  a minimal  cost. 

(3)  The  optimized  query  tree  needs  to  be  executed 
against  the  objects  and  associations  that  are  stored 
in  the  database. 

(4)  The  query  tree  needs  to  be  validated  statically 
and/or  dynamically  with  respect  to  the  integrity 
rules  defined  for  the  database  to  ensure  that  its 
execution  will  not  transfer  the  database  to  an 
inconsistent  state.  The  application  of  a rule  may 
further  trigger  other  rules. 

(5)  In  addition  to  the  above  issues,  other  issues  need 
to  be  investigated  such  as  the  form  and  structure  in 
which  a generated  subdatabase  can  be  permanently 
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stored  (if  needed)  and  the  formatting  techniques 
needed  to  display  a result  to  the  user. 

7 . 2 Current  Implementation  Status 

The  implementation  of  the  graphical  knowledge  base 
definition  facility  and  the  graphic  browser  (see  Figure 
7.1)  is  described  in  [DS088].  This  implementation  was 
carried  out  using  the  C programming  language  on  a Sun 
Color  Workstation.  The  SUNVIEW  (Sun  Visual/Interactive 
Environment  for  Workstations)  package  was  used  to 
generate  the  graphic  objects  for  the  interface.  SUNVIEW 
is  an  00  system  that  supports  interactive  graphics-based 
applications  running  within  windows.  The  objects  in 
SUNVIEW  are  visual  building  blocks  that  can  be  assembled 
to  form  a user  interface  for  an  application. 

The  knowledge  base  definition  facility  allows  the 
user  to  define  the  OSAM*  schema  of  an  application  and 
detects  any  syntactic  errors  in  the  data  definition.  The 
graphic  browser  facilitates  browsing  both  the  schema  and 
the  data  of  the  database.  It  provides  several  useful 
functions  such  as  Zoom,  Unzoom,  Focus,  etc.,  which  are 
used  by  the  query  module  [TY88]  to  help  browsing  the 
subset  of  the  schema  of  interest  to  a query. 

The  design  and  implementation  of  the  graphics 
interface  for  the  00  query  language  OQL  is  presented  in 
[TY88] . This  interface  allows  the  user  to  pose  a query  in 
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the  graphics  mode  by  using  the  graphics  browser  with  some 
additional  facilities.  The  interface  then  translates  the 
query  to  its  equivalent  OQL  textual  representation  in 
order  to  be  evaluated  on  the  underlying  Vbase  system 
[WU89] . 

The  implementation  of  OQL  in  the  Vbase  environment 
is  described  in  [XIA89,  WU89] . In  this  implementation,  an 
OSAM*  schema  is  mapped  into  an  internal  equivalent 
representation  in  Vbase.  OQL  queries  are  compiled  into  an 
internal  executable  form,  i.e.,  a query  tree.  This 
internal  form  makes  use  of  the  operators  of  the  link 
algebra  described  in  [GU089] . Operations  specified  in  the 
Operation  clause  of  an  OQL  query  are  then  performed  on 
the  retrieved  data. 

7 . 3 Control  Strategies  for  Deductive  Rules 

The  two  control  strategies  used  in  inferencing  are 
forward  and  backward  chaining  of  rules.  In  the  backward 
chaining  strategy,  the  evaluation  of  a derived 
subdatabase  is  delayed  until  a retrieval  query  that  needs 
the  derived  data  is  issued.  On  the  contrary,  in  the 
forward  chaining  strategy,  an  up-to-date  copy  of  the 
derived  subdatabase  is  always  kept  available,  which 
improves  the  performance  of  retrieval  operations.  In  this 
strategy,  whenever  the  data  that  is  used  to  derive  a 
subdatabase  is  updated  (e.g. , by  associating, 
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dissociating,  inserting  objects,  etc.)/  the  relevant 
deductive  rules  are  run  to  maintain  the  consistency 
between  the  derived  subdatabase  and  the  original 
database . 

In  the  relational  system  POSTGRES  [ST087],  only  one 
of  these  two  control  strategies  (i.e.  , forward  and 
backward  chaining  of  rules)  is  assigned  to  each  rule  in 
the  system.  A rule  that  is  defined  to  follow  the  forward 
chaining  strategy  (i.e.,  a forward  chaining  rule)  will  be 
executed  whenever  the  data  that  is  read  by  the  rule  is 
updated,  also,  an  up-to-date  copy  of  the  derived  data  is 
explicitly  stored.  A rule  that  is  defined  to  follow  the 
backward  chaining  strategy  (i.e.,  a backward  chaining 
rule)  will  be  triggered  for  execution  whenever  the  data 
that  the  rule  derives  is  requested  (i.e.,  in  a query)  but 
the  derived  data  is  not  preserved  after  the  query 
session.  In  this  rule-oriented  control  strategy,  a rule 
is  restricted  to  follow  only  one  of  the  two  control 
strategies  at  all  times. 

The  disadvantage  of  this  rule-oriented  control 
strategy  is  that  it  imposes  a restriction  on  the  mixing 
of  forward  and  backward  chaining  rules  [ST087]  such  that 
a forward  chaining  rule  cannot  read  any  data  written  by 
backward  chaining  rules.  To  describe  this  problem,  let 
the  following  be  a series  of  rules  Ra  to  Rd  and  the 
results  REa  to  REd  derived  by  these  rules. 
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Ra  Rb  Rc  Rd 

DB > REa > REb > REc > REd 

Also,  let  Ra  and  Rb  be  defined  as  backward  chaining  rules 
and  Rc  and  Rd  as  forward  chaining  rules.  If  the  original 
database  DB  is  updated,  rules  Rc  and  Rd,  though  they  are 
forward  chaining  rules,  will  not  be  triggered  to  update 
the  result  REd  until  someone  requests  the  data  of  REb. 
Thus,  REd  may  be  inconsistent  with  the  base  data. 

To  overcome  this  problem,  we  use  a result-oriented 
control  strategy  in  which  we  specify  for  each  result 
(derived  subdatabase)  whether  it  is  to  be  pre-evaluated 
or  post-evaluated . The  same  rule  may  follow  the  forward 
or  backward  chaining  strategy  depending  on  whether  the 
derived  subdatabase  is  to  be  pre-  or  post-evaluated. 

To  illustrate  by  the  example  above,  assume  that  REd 
is  defined  as  pre-evaluated  and  REb  is  defined  as  post- 
evaluated.  Whenever  the  database  DB  is  updated,  the  rules 
Ra , Rb , Rc , and  Rd  will  be  triggered  in  the  forward 
chaining  fashion  to  keep  REd  (which  is  explicitly  stored) 
up-to-date.  REb,  on  the  other  hand,  will  be  evaluated 
whenever  a retrieval  operation  is  issued  against  it.  In 
this  case,  the  rules  Ra  and  Rb  that  derive  REb  are 
applied  in  the  backward  chaining  fashion.  Thus,  Ra  and  Rb 
follow  one  control  strategy  when  deriving  REd  and  the 
other  control  strategy  when  deriving  REb.  This  technique 
offers  more  flexibility  and  alleviates  the  restriction  in 
POSTGRES  described  above. 
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7.4  Integrity  Constraints 

The  implementation  of  the  integrity  constraint 
subsystem  is  described  in  [SIN89] . In  this 
implementation,  each  constraint  is  compiled  into  an 
internal  executable  form  (a  query  tree  which  may  be 
enhanced  with  some  additional  operators  to  enable  for  the 
representation  of  constraints).  When  a user's  query  is 
compiled  into  a query  tree,  all  the  constraints  that  are 
relevant  to  the  query  (i.e.,  the  constraints  that  may  get 
violated  as  a result  of  executing  the  query)  are 
triggered  for  execution.  The  corresponding  query  trees 
for  these  constraints  are  used  to  modify  the  query  tree 
of  the  user ' s query  to  get  a modified  query  tree  in  which 
all  the  constraints  have  been  accounted  for.  The  modified 
query  tree  is  then  evaluated  against  the  database.  The 
details  of  this  process  can  be  found  in  [SIN89] . 

7.5  Knowledge-based  Query  Optimization 

Knowledge-based  query  optimization  (some  times 
referred  to  as  semantic  query  optimization)  is  a process 
in  which  a query  is  transformed  to  a semantically 
equivalent  one  whose  evaluation  is  less  costly.  Two 
queries  are  said  to  be  semantically  equivalent  if  they 
produce  the  same  result  irrespective  of  the  state  of  the 
database  [KINJ84].  This  transformation  process  makes  use 
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of  the  knowledge  about  the  application  domain  that  is 
captured  in  the  schema  and  in  the  constraints.  Such 
optimization  techniques  are  expected  to  play  a more 
important  role  in  00  knowledge  base  systems  than  in 
relational  database  systems  because  much  more  semantics 
of  an  application  world  can  be  represented  in  an  00 
system. 

In  the  following,  we  demonstrate  how  semantic  query 
optimization  can  be  effective  in  reducing  the  evaluation 
costs  of  queries.  Let  the  following  be  a constraint 
defined  for  the  schema  of  Figure  3.1. 

Constraint  A student  can  also  be  a teacher  at  the  same 
time  only  if  he/she  is  a TA . 

IF  Teacher  * Person  * Student 

THEN  Teacher  * Person  * Student  * Grad  * TA 
This  means  that  for  an  extensional  pattern  of  the  type 
CTeacher,  Person,  Student>  to  exist  in  the  database  it 
has  to  be  part  of  a larger  pattern  of  the  type  <Teacher, 
Person,  Student,  Grad,  TA> . Let  the  following  be  a query 
that  retrieves  the  section# ' s for  the  sections  that  are 
taught  by  TA  ' s together  with  the  classifications  and 
ss#'s  for  those  TA ' s . 

CONTEXT  Section  * Teacher  * TA  * Grad  * Student  * 
Person 

SELECT  section#,  ss#,  classification 


DISPLAY 
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If  we  use  the  number  of  associations  to  be  traversed  as  a 
high-level  measure  of  the  evaluation  cost,  the  cost  of 
evaluating  this  query  as  given  above  is  5 units.  Based  on 
the  semantics  of  the  generalization  association  and  the 
semantics  captured  by  the  constraint  above,  this  query 
can  be  transformed  to  the  following  semantically 
equivalent  query. 

CONTEXT  Section  * Teacher  * Person  * Student 
SELECT  section#,  ssl,  classification 
DISPLAY 

The  cost  of  evaluating  this  query  is  3 units  since  only 
three  associations  need  to  be  traversed,  which  is  more 
efficient  than  the  originally  posed  query. 
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Figure  7.1:  The  Overall  Architecture  of  the  OKBMS  System 


CHAPTER  8 
CONCLUSIONS 

Due  to  their  semantic  expressiveness,  00  and 
semantic  data  models  have  been  used  in  recent  years  to 
support  many  of  the  new  database  application  domains  that 
cannot  otherwise  be  easily  supported  by  the  record- 
oriented  data  models  and  systems  (e.g.,  relational).  This 
has  motivated  research  in  different  areas  related  to  00 
data  models  and  to  the  00  database  systems  that  can  be 
built  based  on  them.  These  areas  include  but  are  not 
limited  to:  query  language  design,  constraint 
specification,  deductive  reasoning,  efficient  storage  and 
retrieval  of  objects,  and  constraint  enforcement 
techniques.  The  work  presented  in  this  dissertation 
concentrates  on  some  of  these  research  areas. 

In  this  dissertation,  we  have  introduced  a query 
model  for  object-oriented  databases  that  maintains  the 
closure  property,  which  is  not  supported  by  any  of  the 
existing  query  languages  that  have  been  introduced  for  00 
and  semantic  data  models.  The  result  of  a query  in  our 
query  model  is  a subdatabase  that  is  structurally 
represented  using  an  00  data  model.  This  allows  a 
resulting  subdatabase  to  be  further  uniformly  manipulated 
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to  produce  other  subdatabases  that  satisfy  additional 
qualification  conditions. 

Based  on  this  query  model,  we  have  developed  the  00 
query  language  OQL.  In  OQL,  it  is  possible  to  identify  a 
subdatabase  of  interest  using  a relatively  small  number 
of  operators.  Logical  AND  and  OR  operators  can  be  used  to 
specify  branching  association  pattern  expressions,  which 
enables  retrieving  subdatabases  with  complex  intensional 
and  extensional  association  patterns  easily.  The  concept 
of  outer  join  is  captured  in  OQL  using  braces  that 
enclose  a subexpression  of  an  association  pattern 
expression.  In  addition,  we  have  introduced  the  notion  of 
union  compatible  subdatabases  and  defined  the  semantics 
of  set  operators  as  can  be  applied  to  union  compatible 
subdatabases.  Multiple  system-defined  as  well  as  user- 
defined  operations  can  be  specified  and  performed  on  the 
classes  of  a subdatabase. 

We  also  introduced  in  this  dissertation  a knowledge 
definition  language  that  allows  for  representing  the 
knowledge  pertaining  to  an  00  application  domain  in  the 
form  of  deductive  rules  and  integrity  rules 
(constraints).  Deductive  rules  in  our  language  derive  new 
patterns  of  associations  among  objects  based  on  existing 
or  other  derived  patterns.  Rules  may  form  inference 
chains  in  which  a rule  derives  new  patterns  based  on  the 
patterns  derived  by  the  immediately  preceding  rule(s)  in 
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the  chain.  Integrity  rules  can  be  defined  using  our 
knowledge  definition  language  to  specify  the  patterns  of 
object  associations  in  which  objects  of  some  classes  must 
or  must  not  fall. 

The  knowledge  definition  language  is  tightly  coupled 
with  object-oriented  query  language  OQL  in  the  sense  that 
the  constructs  of  the  OQL  language  are  used  in  the 
definition  of  rules  (the  IF  part  of  a rule  can  be  an  OQL 
query  that  checks  whether  the  objects  in  the  database 
satisfy  some  given  conditions)  and  the  data  that 
deductive  rules  may  derive  can  be  queried  by  OQL  in  a 
uniform  way.  The  tight  coupling  of  these  two  languages 
provides  a basis  for  the  implementation  of  00  knowledge 
base  management  systems  ( OKBMS ) , which  integrate  concepts 
and  techniques  typically  found  in  different  categories  of 
systems,  namely,  database  management  systems  and  expert 
systems.  Some  implementation  techniques  for  developing 
OKBMSs  have  been  presented  in  this  dissertation. 

The  following  are  some  of  the  lines  along  which 
future  research  relevant  to  this  dissertation's  topic  may 
be  conducted. 

(1)  To  study  the  knowledge-based  query  optimization 
techniques  in  an  object-oriented  environment. 

(2)  The  design  of  other  query  languages  based  on  our 
closed  query  model  that  have  different  flavor, 
syntax,  and  user-orientation  from  that  of  OQL. 
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(3)  The  different  strategies  that  can  be  adopted  for  the 
compilation  of  deductive  and  integrity  rules  into  an 
executable  internal  representation. 

(4)  The  efficient  implementation  of  deductive  rules  and 
constraints . 

(5)  The  storage  structure  issues  and  optimization 
techniques  for  00  databases. 


APPENDIX  A 

GRAMMAR  OF  THE  00  QUERY  LANGUAGE  OQL 

In  this  appendix,  we  provide  the  grammar  for  the 
object-oriented  query  language  OQL,  which  is  introduced 
in  this  dissertation.  Some  of  the  constructs  of  this 
language  are  used  in  the  knowledge  definition  language. 
The  grammar  of  the  knowledge  definition  language  is  given 
in  Appendix  B. 

OQL  query  ::=  subdb~def inition 
operation  " ; " 
subdb-def inition  ::= 

CONTEXT  ape 
[WHERE  conditions] 

[SELECT  target-list] 

ape  linear-ape  [ op  logical-op  "("linear-ape, 

linear-ape  ")"  [op  logical-op  linear-ape]] 
linear-ape  ::=  class-name  [ "["  intra-cond  "]"  ] 

[ op  linear-ape 

op  assoc-pat-subexpression  ] " ; " 
assoc-pat-subexpression  ::=  linear-ape 

op  : : = * | ! " / " 
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logical-op  ::=  AND  | OR 

intra-cond  ::=  attribute-name  scalar-op  value  "/" 
scalar-op  ::=  < | <=  | = | >=  | > "/" 

conditions  : : = 

attribute-reference 

scalar-op 

attribute-reference  " / " 
attribute-reference  ::= 
attribute-name 

class-name  "["  attribute-name  "]" 
target-list  : : = 

target-reference  { target-reference} 

target-reference  ::=  attribute-reference 

I 

class-name  " ; " 

operations  ::=  operation  {operation} 
operation  ::=  UPDATE  argumentl 

INSERT  argumentl 

I 

DELETE  argument2 
DISPLAY  argument3 


PRINT  argument  3 


ASSOCIATE  argument4 


DISSOCIATE  argument4 

argumentl  ::= 

attribute-reference  value 

{ " ; " attribute-reference  value 


argument2  ::= 

class-name  { " , " class-name  } " 

argument3 

[ class-name  ] " ; " 

argument4  ::= 

" ( " key-attribute-reference 

key-attribute-reference  " ) " 


key- attribute- 

-reference  : : = 

key- attribute-name 


class-name  "["  key-attribute-name  "] 


?!  , ?l 


APPENDIX  B 

GRAMMAR  OF  THE  KNOWLEDGE  DEFINITION  LANGUAGE 

In  this  appendix,  we  describe  the  grammar  of  the 
knowledge  definition  language,  which  can  be  used  as  part 
of  defining  a knowledge  base  schema  for  an  application. 
The  structural  constructs  presented  in  this  grammar  are 
the  ones  provided  by  the  00  semantic  association  model 
OSAM*  [ SU8  8 ] . 

schema-declaration  ::= 

SCHEMA  schema-name  " ; " 

domain-declaration  " ,•  " 
entity-declaration  " ,• " 

END  schema-name  " ; " 
domain-declaration  ::= 

DOMAIN-CLASSES  " ; " 

class-name  : data-type 
{ class-name  : data-type} 

END  DOMAIN-CLASSES  " ; " 

data-type  = INTEGER  | REAL  | STRING  | CHARACTER  | 
BOOLEAN  | COMPUTE 
entity-declaration  ::= 

ENTITY-CLASSES 


149 


150 


entity-class-block  [ " ; " entity-class-block  } 

END  ENTITY-CLASSES 
entity-class-block  ::= 

ENTITY-CLASS  class-name  "/" 

association-declaration  " ; " 
operation-declaration  " ; " 
constraint-declaration 
END  class-name  " ; " 
association-declaration  ::= 

ASSOCIATION-SECTION 

{ association-specification  } 

END  ASSOCIATION-DECLARATION  "/" 
association-specification  ::= 

generalization-declaration 

I 

aggregation-declaration 

interaction-declaration 

I 

cross product-declaration 

composition- declaration 
generalization-declaration  ::= 

GENERALIZATION  OF  "("class-name  { class-name  }")" 

g-constraints 

g-constraints  ::=  class-name  , class-name  g-const 
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{ " ; " class-name  , class-name  " : " g-const  } 
g-const  ::=  SX  | SI  | SE  | ST-SS 
aggregation-section  ::= 

[AGGREGATION  OF  " ( " attribute-declaration 

{ " ; " attribute  declaration  } " ) " 
attribute-declaration  ::  = attribute-name  class-name 

" ( " A-const  " ) " 

A-const  OPTIONAL  | TOTAL 

interaction-declaration  ::= 

INTERACTION  OF  "("class-name  class-name 

{ " ; " class-name  } " ) " 

i-constraints 

i-constraints  ::=  class-name  , class-name  cardinality 

{ " ; " class-name  , class-name  " : " cardinality  } 
cardinality  ::=  1:1  | 1:N  | M : N | N:1 
corssproduct-declaration  ::= 

CROSSPRODUCT  OF  " ( " class-name  { " ; " class-name  } " ) " 
composition-declaration  ::= 

COMPOSITION  OF  "("  class-name  { class-name  } ")" 

operation-declaration  ::= 

OPERATION-SECTION 

operation-specification 
{ operation-specification  } 

END  OPERATION-SECTION 

operation-specification  ::=  ufunction  | uoperation 
uf unction  function-name  "("  arguments  ")" 
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( " arguments  " ) 


" : " data-type 

uoperation  ::=  operation-name  ::= 
arguments  ::=  variable-name  class-name 

{ " ; " variable-name  " : " class-name  } 
constraint-declaration  ::= 

CONSTRAINT-SECTION 

constraint  { " : " constraint  } 

END  CONSTRAINT-SECTION 
constraint  ::=  RULE  <rule-id> 

TRIGGER-COND  trigger- specification 
rule-body 

CORRECT IVE-ACT I ON  ac t ion- spec i f ication 
END  <rule-id> 
trigger-specification  ::= 

" ( " option  operation 
{ " , " option  operation  } " ) " " ; " 
option  : : = BEFORE  | AFTER  | PARALLEL  " ; " 
rule-body  ::=  not-exist-expression 

I 

if-then-else-rule 

I 

cardinality-const" ; " 

not-exist-expression  ::=  NOT  EXIST  ( subdb-def inition ) " 
if-then-else-rule  ::= 

IF  [NOT  EXIST]  subdb-def inition 
THEN  [NOT  EXIST]  subdb-def inition 
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ELSE  [NOT  EXIST]  subdb-def inition 
action-specification  ::=  operation-name 
operation-name  ::=  operation  | message-spec 
message-spec  ::=  MESSAGE  message-body 
message-body  ::=  "("  text  ")" 
cardinality-const 

subdb-def inition 

MAPPING  classlist  classlist  cardinality 

classlist  ::=  "("  class-name  { class-name  ] ")" 

cardinality  1:1  | 1:N  | M : N | N:1 

deductive-rule  ::=  IF  subdb-def inition 

THEN  derived-data 

derived-data  ::=  subdb-id  (classlist) 

I 

attribute-computation 
subdb-id  ::=  STRING 
attribute-computation  ::= 
attribute-name 
computation-expression 
computation-expression  ::= 
attribute-name 


COUNT  " ( " class-name  BY  class-name  " ) 
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